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Abstract 

Conjunctive queries play an important role as an expressive query language for Descrip- 
tion Logics (DLs). Although modern DLs usually provide for transitive roles, conjunctive 
query answering over DL knowledge bases is only poorly understood if transitive roles are 
admitted in the query. In this paper, we consider unions of conjunctive queries over knowl- 
edge bases formulated in the prominent DL STiTQ and allow transitive roles in both the 
query and the knowledge base. We show decidability of query answering in this setting 
and establish two tight complexity bounds: regarding combined complexity, we prove that 
there is a deterministic algorithm for query answering that needs time single exponential 
in the size of the KB and double exponential in the size of the query, which is optimal. 
Regarding data complexity, we prove containment in co-NP. 



1. Introduction 

Description Logics (DLs) are a family of logic based knowledge representation formalisms 
(Baader, Calvanese, McGuinness, Nardi, & Patel-Schneider, 2003). Most DLs are fragments 
of First-Order Logic restricted to unary and binary predicates, which are called concepts and 
roles in DLs. The constructors for building complex expressions are usually chosen such that 
the key inference problems, such as concept satisfiability, are decidable and preferably of low 
computational complexity. A DL knowledge base (KB) consists of a TBox, which contains 
intensional knowledge such as concept definitions and general background knowledge, and 
an ABox, which contains extensional knowledge and is used to describe individuals. Using 
a database metaphor, the TBox corresponds to the schema, and the ABox corresponds to 
the data. In contrast to databases, however, DL knowledge bases adopt an open world 
semantics, i.e., they represent information about the domain in an incomplete way. 

Standard DL reasoning services include testing concepts for satisfiability and retrieving 
certain instances of a given concept. The latter retrieves, for a knowledge base consisting of 
an ABox A and a TBox T, all (ABox) individuals that are instances of the given (possibly 
complex) concept expression C, i.e., all those individuals a such that T and A entail that a 
is an instance of C. The underlying reasoning problems are well-understood, and it is known 
that the combined complexity of these reasoning problems, i.e., the complexity measured in 
the size of the TBox, the ABox, and the query, is ExpTiME-complete for ST-LXQ (Tobies, 
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2001). The data complexity of a reasoning problem is measured in the size of the ABox 
only. Whenever the TBox and the query are small compared to the ABox, as is often the 
case in practice, the data complexity gives a more useful performance estimate. For SHIQ, 
instance retrieval is known to be data complete for co-NP (Hustadt, Motik, &: Sattler, 
2005). 

Despite the high worst case complexity of the standard reasoning problems for very 
expressive DLs such as STilQ, there are highly optimized implementations available, e.g., 
FaCT++ (Tsarkov & Horrocks, 2006), KAON2 1 , Pellet (Sirin, Parsia, Cuenca Grau, Kalyan- 
pur, Sz Katz, 2006), and RacerPro 2 . These systems are used in a wide range of applications, 
e.g., configuration (McGuinness & Wright, 1998), bio informatics (Wolstencroft, Brass, 
Horrocks, Lord, Sattler, Turi, & Stevens, 2005), and information integration (Calvanese, 
De Giacomo, Lenzerini, Nardi, &: Rosati, 1998b). Most prominently, DLs are known for 
their use as a logical underpinning of ontology languages, e.g., OIL, DAML+OIL, and 
OWL (Horrocks, Patel-Schneider, & van Harmelen, 2003), which is a W3C recommenda- 
tion (Bechhofer, van Harmelen, Hendler, Horrocks, McGuinness, Patel-Schneider, & Stein, 
2004). 

In data-intensive applications, querying KBs plays a central role. Instance retrieval 
is, in some aspects, a rather weak form of querying: although possibly complex concept 
expressions are used as queries, we can only query for tree-like relational structures, i.e., 
a DL concept cannot express arbitrary cyclic structures. This property is known as the 
tree model property and is considered an important reason for the decidability of most 
Modal and Description Logics (Gradel, 2001; Vardi, 1997). Conjunctive queries (CQs) 
are well known in the database community and constitute an expressive query language 
with capabilities that go well beyond standard instance retrieval. For an example, consider 
a knowledge base that contains an ABox assertion (3hasSon.(3hasDaughter.T))(Mary), 
which informally states that the individual (or constant in FOL terms) Mary has a son 
who has a daughter; hence, that Mary is a grandmother. Additionally, we assume that 
both roles hasSon and hasDaughter have a transitive super-role hasDescendant. This im- 
plies that Mary is related via the role hasDescendant to her (anonymous) grandchild. For 
this knowledge base, Mary is clearly an answer to the conjunctive query hasSon(x, y) A 
hasDaughter (y, z) A hasDescendant (x, z), when we assume that x is a distinguished variable 
(also called answer or free variable) and y, z are non-distinguished (existentially quantified) 
variables. 

If all variables in the query are non-distinguished, the query answer is just true or false 
and the query is called a Boolean query. Given a knowledge base IC and a Boolean CQ q, the 
query entailment problem is deciding whether q is true or false w.r.t. IC. If a CQ contains 
distinguished variables, the answers to the query are those tuples of individual names for 
which the knowledge base entails the query that is obtained by replacing the free variables 
with the individual names in the answer tuple. The problem of finding all answer tuples is 
known as query answering. Since query entailment is a decision problem and thus better 
suited for complexity analysis than query answering, we concentrate on query entailment. 
This is no restriction since query answering can easily be reduced to query entailment as 
we illustrate in more detail in Section 2.2. 

1. http://kaon2.semanticweb.org 

2. http://www.racer-systems.com 
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Devising a decision procedure for conjunctive query entailment in expressive DLs such as 
SHIQ is a challenging problem, in particular when transitive roles are admitted in the query 
(Glimm, Horrocks, & Sattler, 2006). In the conference version of this paper, we presented 
the first decision procedure for conjunctive query entailment in SHXQ. In this paper, we 
generalize this result to unions of conjunctive queries (UCQs) over SHXQ knowledge bases. 
We achieve this by rewriting a conjunctive query into a set of conjunctive queries such that 
each resulting query is either tree-shaped (i.e., it can be expressed as a concept) or grounded 
(i.e., it contains only constants/individual names and no variables). The entailment of both 
types of queries can be reduced to standard reasoning problems (Horrocks & Tessaris, 2000; 
Calvanese, De Giacomo, & Lenzerini, 1998a). 

The paper is organized as follows: in Section 2, we give the necessary definitions, followed 
by a discussion of related work in Section 3. In Section 4, we motivate the query rewriting 
steps by means of an example. In Section 5, we give formal definitions for the rewriting 
procedure and show that a Boolean query is indeed entailed by a knowledge base tC iff the 
disjunction of the rewritten queries is entailed by KL. In Section 6, we present a deterministic 
algorithm for UCQ entailment in SHXQ that runs in time single exponential in the size of 
the knowledge base and double exponential in the size of the query. Since the combined 
complexity of conjunctive query entailment is already 2ExpTiME-hard for the DL ACCX 
(Lutz, 2007), it follows that this problem is 2ExpTiME-complete for SHXQ. This shows 
that conjunctive query entailment for SHXQ is strictly harder than instance checking, 
which is also the case for simpler DLs such as EL (Rosati, 2007b). We further show that 
(the decision problem corresponding to) conjunctive query answering in SHXQ is co-NP- 
complete regarding data complexity, and thus not harder than instance retrieval. 

The presented decision procedure gives not only insight into query answering; it also has 
an immediate consequence on the field of extending DL knowledge bases with rules. From 
the work by Rosati (2006a, Thm. 11), the consistency of a SHXQ knowledge base extended 
with (weakly-safe) Datalog rules is decidable iff the entailment of unions of conjunctive 
queries in SHXQ is decidable. Hence, we close this open problem as well. 

This paper is an extended version of the conference paper: Conjunctive Query Answer- 
ing for the Description Logic SHXQ. Proceedings of the Twentieth International Joint 
Conference on Artificial Intelligence (IJCAI'07), Jan 06 - 12, 2007. 

2. Preliminaries 

We introduce the basic terms and notations used throughout the paper. In particular, we 
introduce the DL SHXQ (Horrocks, Sattler, & Tobies, 2000) and (unions of) conjunctive 
queries. 

2.1 Syntax and Semantics of SHXQ 

Let Nq, Nr, and Nj be countably infinite sets of concept names, role names, and individual 
names. We assume that the set of role names contains a subset N t R C Nr of transitive role 
names. A role is an element of Nr U {r~ | r € Nr}, where roles of the form r~ are called 
inverse roles. A role inclusion is of the form r C s with r, s roles. A role hierarchy TZ is a 
finite set of role inclusions. 
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An interpretation I = (A x , x ) consists of a non-empty set A x , the domain of I, and a 
function x , which maps every concept name A to a subset A x C A x , every role name r G A/j 
to a binary relation r x C A 1 x A 1 , every role name r € A^ to a transitive binary relation 
r x C A 1 x A 1 , and every individual name a to an element a x G A 1 . An interpretation 
X satisfies a role inclusion r C s if C s x and a role hierarchy 1Z if it satisfies all role 
inclusions in 1Z. 

We use the following standard notation: 

1. We define the function Inv over roles as lnv(r) := r~ if r G A^j and lnv(r) := s if 
r = s~ for a role name s. 

2. For a role hierarchy 7£, we define as the reflexive transitive closure of Q over 
1Z U {lnv(r) C lnv(s) | r C s G 7£}. We use r =7^ s as an abbreviation for r E^s and 

3. For a role hierarchy 7Z and a role s, we define the set Trans^ of transitive roles as 
{s I there is a role r with r =7^ s and r G iV^R or lnv(r) G Ntn}. 

4. A role r is called simple w.r.t. a role hierarchy 7£ if, for each role s such that s EE^r, 
s £ Trans-ft. 

The subscript 7£ of and Trans^ is dropped if clear from the context. The set of STtTQ- 
concepts (or concepts for short) is the smallest set built inductively from Nq using the 
following grammar, where A G Nc, n G IN, r is a role and s is a simple role: 

C ::= T I _L | A \ \ C x n C 2 \ G\ U C 2 | Vr.C | 3r.C |< n s.C |^ n s.C. 

Given an interpretation I, the semantics of ^'HXQ-concepts is defined as follows: 

T x = A x (c n £>) z =C I nD I (-.C) 1 = A X \C X 

± x = (C U D) 1 = C X UD X 
(Vr.C) 1 = {d e A 1 I if (d, d') G r 1 , then d' G C x } 
{3r.C) X = {d G A x I there is a (d, d') G r 2 with d! G C z } 
« n s.Cf = {d G A 1 I tt(s z (d,C)) < n} 
n s.Cf = {d G A x I tli^i^C)) > n} 

where (j(M) denotes the cardinality of the set M and s x (d,C) is defined as 

{d' G A x I (d, d') G s 1 and d' G C x }. 

A general concept inclusion (GCI) is an expression C Q D, where both C and D are 
concepts. A finite set of GCIs is called a TBox. An interpretation X satisfies a GCI C C D 
if C tD 1 , and a TBox 7" if it satisfies each GCI in T. 

An (ABox) assertion is an expression of the form C(a), r(a,b), —ir(a,b), or a^b, where 
C is a concept, r is a role, a, b G AT/. An ABox is a finite set of assertions. We use lnds(^4) to 
denote the set of individual names occurring in A. An interpretation I satisfies an assertion 
C(o) if a x G C x , r(a,b) if (a 2 ,^) G r x , ^r{a,b) if (a 2 ,^ 1 ) £ r z , and if a z / 6 1 . An 
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interpretation X satisfies an ABox if it satisfies each assertion in A, which we denote with 
1\= A. 

A knowledge base (KB) is a triple (T, TZ, A) with T a TBox, 1Z a role hierarchy, and A 
an ABox. Let K, = (T,TZ,A) be a KB and X = (A x ,- X ) an interpretation. We say that X 
satisfies ICiiX satisfies T, TZ, and A. In this case, we say that X is a model of K, and write 
X \= )C. We say that K, is consistent if K. has a model. 

2.1.1 Extending STiXQ to SHXQ n 

In the following section, we show how we can reduce a conjunctive query to a set of ground 
or tree-shaped conjunctive queries. During the reduction, we may introduce concepts that 
contain an intersection of roles under existential quantification. We define, therefore, the 
extension of SHXQ with role conjunction/intersection, denoted as SHXQ 11 and, in the 
appendix, we show how to decide the consistency of STiXQ n knowledge bases. 

In addition to the constructors introduced for SI-HQ, SHXQ 11 allows for concepts of 
the form 

C ::= VR.C | 3R.C |< n S.C \^ n S.C, 

where R := r\ n . . . n r n , S := s± n . . . n s n , n, . . . , r n are roles, and S\, . . . , s n are simple 
roles. The interpretation function is extended such that (ri n . . . n r n ) x = r\ X n ... PI r n x . 

2.2 Conjunctive Queries and Unions of Conjunctive Queries 

We now introduce Boolean conjunctive queries since they are the basic form of queries we 
are concerned with. We later also define non-Boolean queries and show how they can be 
reduced to Boolean queries. Finally, unions of conjunctive queries are just a disjunction of 
conjunctive queries. 

For simplicity, we write a conjunctive query as a set instead of as a conjunction of atoms. 
For example, we write the introductory example from Section 1 as 

{hasSon(x, y), hasDaughter(y, z), hasDescendant(x, z)}. 

For non-Boolean queries, i.e., when we consider the problem of query answering, the 
answer variables are often given in the head of the query, e.g., 

(xi, X2, £3) <— {hasSon(xi, X2), hasDaughter(x2, X3), hasDescendant(xi, x^)} 

indicates that the query answers are those tuples (01,02,03) of individual names that, 
substituted for x\,X2, and X3 respectively, result in a Boolean query that is entailed by the 
knowledge base. For simplicity and since we mainly focus on query entailment, we do not 
use a query head even in the case of a non-Boolean query. Instead, we explicitly say which 
variables are answer variables and which ones are existentially quantified. We now give a 
definition of Boolean conjunctive queries. 

Definition 1. Let Ny be a countably infinite set of variables disjoint from Nq, Nr, and Nj. 
A term t is an element from NyUNj. Let C be a concept, r a role, and t, t' terms. An atom 
is an expression C(t), r(t,t'), or t ~ t' and we refer to these three different types of atoms 
as concept atoms, role atoms, and equality atoms respectively. A Boolean conjunctive query 
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q is a non-empty set of atoms. We use Vars(g) to denote the set of (existentially quantified) 
variables occurring in q, lnds(g) to denote the set of individual names occurring in q, and 
Terms^) for the set of terms in q, where Terms(g) = Vars(q) U lnds(g). If all terms in q 
are individual names, we say that q is ground. A sub-query of q is simply a subset of q 
(including q itself). As usual, we use ${q) to denote the cardinality of q, which is simply the 
number of atoms in q, and we use \q\ for the size of q, i.e., the number of symbols necessary 
to write q. A SHXQ conjunctive query is a conjunctive query in which all concepts C that 
occur in a concept atom C(t) are 57£I<2-concepts. 

Since equality is reflexive, symmetric and transitive, we define « as the transitive, 
reflexive, and symmetric closure of ~ over the terms in q. Hence, the relation A is an 
equivalence relation over the terms in q and, for t G Terms(q), we use [t] to denote the 
equivalence class of t by & . 

Let X = (A x , x ) be an interpretation. A total function 7r: Terms(g) — > A x is an evalua- 
tion if (i) vr(a) = a x for each individual name a G lnds(g) and (ii) ir{t) = Tr(t') for all 
We write 

• X K C{t) if n(t) G C x ; 

. XKKMO if (vr(t),7r(t'))Gr I ; 

• IpfRit' if ir(t) =ir(t'). 

If, for an evaluation ir, X \= n at for all atoms at G q, we write X ^ 7r q. We say that X 
satisfies q and write X |= q if there exists an evaluation tt such that X \= T q. We call such a 
7r a match for g in Z. 

Let K, be a SHXQ knowledge base and g a conjunctive query. If X \= K. implies X \= q, 
we say that fC entails q and write K, \= q. A 

The query entailment problem is defined as follows: given a knowledge base /C and a 
query q, decide whether K\= q. 

For brevity and simplicity of notation, we define the relation G over atoms in q as follows: 
C{t) G q if there is a term t' G Terms(q) such that t&t' and C(i') G q, and r(ti,t2) G if 
there are terms t[, t' 2 G Terms(g) such that ti&t[, t2&t' 2 , and r(t' 1: t' 2 ) G g or Inv(r)(t2, t[) G g. 
This is clearly justified by definition of the semantics, in particular, because X \= r(t, t') 
implies that I \= lnv(r)(t', t). 

When devising a decision procedure for CQ entailment, most complications arise from 
cyclic queries (Calvanese et al., 1998a; Chekuri & Rajaraman, 1997). In this context, when 
we say cyclic, we mean that the graph structure induced by the query is cyclic, i.e., the graph 
obtained from q such that each term is considered as a node and each role atom induces 
an edge. Since, in the presence of inverse roles, a query containing the role atom r(t, t') is 
equivalent to the query obtained by replacing this atom with lnv(r)(£', t), the direction of 
the edges is not important and we say that a query is cyclic if its underlying undirected 
graph structure is cyclic. Please note also that multiple role atoms for two terms are not 
considered as a cycle, e.g., the query {r(t,t'),s(t,t')} is not a cyclic query. The following is 
a more formal definition of this property. 

Definition 2. A query q is cyclic if there exists a sequence of terms t\, . . . , t n with n > 3 
such that 
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1. for each i with 1 < i < n, there exists a role atom rj(tj, tj+i) € g, 

2. = t n , and 

3. tj ^ tj for 1 < i < j < n. ^ 

In the above definition, Item 3 makes sure that we do not consider queries as cyclic just 
because they contain two terms t, t' for which there are more than two role atoms using the 
two terms. Please note that we use the relation G here, which implicitly uses the relation 
A and abstracts from the directedness of role atoms. 

In the following, if we write that we replace r(t, t') € q with s(ti,t2), ■ ■ ■ , s{t n -\,t n ) for 
t = ti and t' = t n , we mean that we first remove any occurrences of r(t,t') and lnv(r)(i',i) 
such that t&t and t'&t' from q, and then add the atoms s(ti, £2), ■ ■ ■ , s(t n -i,t n ) to q. 

W.l.o.g., we assume that queries are connected. More precisely, let q be a conjunctive 
query. We say that q is connected if, for all t, t' 6 Terms((/), there exists a sequence t±, . . . , t n 
such that t\ = t, t n = t' and, for all 1 < i < n, there exists a role r such that r(ij, ti+i) G g. 
A collection q\,...,q n of queries is a partitioning of g if = gi U . . . U q n , qi H qj = for 
1 < * < j < and each gj is connected. 

Lemma 3. Let K, be a knowledge base, q a conjunctive query, and q±, . . . ,q n a partitioning 
of q. Then K \= q iff K, \= qi for each i with 1 < i < n. 

A proof is given by Tessaris (2001, 7.3.2) and, with this lemma, it is clear that the 
restriction to connected queries is indeed w.l.o.g. since entailment of q can be decided by 
checking entailment of each qi at a time. In what follows, we therefore assume queries to 
be connected without further notice. 

Definition 4. A union of Boolean conjunctive queries is a formula q\ V . . . Vg n , where each 
disjunct qi is a Boolean conjunctive query. 

A knowledge base /C entails a union of Boolean conjunctive queries q\ V . . . V q n , written 
as K (= q\ V . . . V q n , if, for each interpretation X such that I \= /C, there is some i such that 
Z \= qi and 1 < i < n. A 

W.l.o.g. we assume that the variable names in each disjunct are different from the 
variable names in the other disjuncts. This can always be achieved by naming variables 
apart. We further assume that each disjunct is a connected conjunctive query. This is 
w.l.o.g. since a UCQ which contains unconnected disjuncts can always be transformed 
into conjunctive normal form; we can then decide entailment for each resulting conjunct 
separately and each conjunct is a union of connected conjunctive queries. We describe 
this transformation now in more detail and, for a more convenient notation, we write a 
conjunctive query {at\, . . . , atfc} as at\ A . . . Aaifc in the following proof, instead of the usual 
set notation. 

Lemma 5. Let JC be a knowledge base, q = q±V . . . V q n a union of conjunctive queries such 
that, for 1 < i < n, qj, . . . , q\ % is a partitioning of the conjunctive query qi. Then KL\= q iff 






,i, l )e{l,...,fci}x...x{l,...,fc n } 
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Again, a detailed proof is given by Tessaris (2001, 7.3.3). Please note that, due to the 
transformation into conjunctive normal form, the resulting number of unions of connected 
conjunctive queries for which we have to test entailment can be exponential in the size of 
the original query. When analysing the complexity of the decision procedures presented 
in Section 6, we show that the assumption that each CQ in a UCQ is connected does not 
increase the complexity. 

We now make the connection between query entailment and query answering clearer. For 
query answering, let the variables of a conjunctive query be typed: each variable can either 
be existentially quantified (also called non- distinguished) or free (also called distinguished or 
answer variables). Let q be a query in n variables (i.e., Jj(Vars(q)) = n), of which v±, . . . ,v m 
(m < n) are answer variables. The answers of /C = (T, 7Z, A) to q are those m-tuples 
(a±, . . . , a m ) € lnds(.A) m such that, for all models I of JC, I \= n q for some tt that satisfies 
n(vi) = aj 1 for all % with 1 < % < m. It is not hard to see that the answers of K to q can be 
computed by testing, for each (ai, . . . , a m ) € lnds(„4) m , whether the query q\ Vl) ... tVm / ai ,...,a m ] 
obtained from q by replacing each occurrence of Vi with a, for 1 < i < m is entailed by fC. 
The answer to q is then the set of all m-tuples [a\, . . . , a m ) for which /C \= q[ Vl: ... iVm / ai ,...,a m ]- 
Let k = ft(lnds(^l)) be the number of individual names used in the ABox A. Since A is finite, 
clearly k is finite. Hence, deciding which tuples belong to the set of answers can be checked 
with at most k m entailment tests. This is clearly not very efficient, but optimizations can 
be used, e.g., to identify a (hopefully small) set of candidate tuples. 

The algorithm that we present in Section 6 decides query entailment. The reasons for 
devising a decision procedure for query entailment instead of query answering are two- 
fold: first, query answering can be reduced to query entailment as shown above; second, in 
contrast to query answering, query entailment is a decision problem and can be studied in 
terms of complexity theory. 

In the remainder of this paper, if not stated otherwise, we use q (possibly with subscripts) 
for a connected Boolean conjunctive query, /C for a SHXQ knowledge base (T, 1Z, A), 1 for 
an interpretation (A. x ,- X ), and 7r for an evaluation. 



3. Related Work 

Very recently, an automata-based decision procedure for positive existential path queries 
over A£CQZb reg knowledge bases has been presented (Calvanese, Eiter, &: Ortiz, 2007). 
Positive existential path queries generalize unions of conjunctive queries and since a SHXQ 
knowledge base can be polynomially reduced to an ACCQZb reg knowledge base, the pre- 
sented algorithm is a decision procedure for (union of) conjunctive query entailment in 
SHIQ as well. The automata-based technique can be considered more elegant than our 
rewriting algorithm, but it does not give an NP upper bound for the data complexity as 
our technique. 

Most existing algorithms for conjunctive query answering in expressive DLs assume, 
however, that role atoms in conjunctive queries use only roles that are not transitive. As a 
consequence, the example query from the introductory section cannot be answered. Under 
this restriction, decision procedures for various DLs around ST-LXQ are known (Horrocks & 
Tessaris, 2000; Ortiz, Calvanese, & Eiter, 2006b), and it is known that answering conjunctive 
queries in this setting is data complete for co-NP (Ortiz et al., 2006b). Another common 
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restriction is that only individuals named in the ABox are considered for the assignments 
of variables. In this setting, the semantics of queries is no longer the standard First-Order 
one. With this restriction, the answer to the example query from the introduction would be 
false since Mary is the only named individual. It is not hard to see that conjunctive query 
answering with this restriction can be reduced to standard instance retrieval by replacing 
the variables with individual names from the ABox and then testing the entailment of 
each conjunct separately. Most of the implemented DL reasoners, e.g., KAON2, Pellet, 
and RacerPro, provide an interface for conjunctive query answering in this setting and 
employ several optimizations to improve the performance (Sirin & Parsia, 2006; Motik, 
Sattler, & Studer, 2004; Wessel & Moller, 2005). Pellet appears to be the only reasoner 
that also supports the standard First-Order semantics for SHXQ conjunctive queries under 
the restriction that the queries are acyclic. 

To the best of our knowledge, it is still an open problem whether conjunctive query 
entailment is decidable in SHOXQ. Regarding undecidability results, it is known that 
conjunctive query entailment in the two variable fragment of First-Order Logic £2 is un- 
decidable (Rosati, 2007a) and Rosati identifies a relatively small set of constructors that 
causes the undecidability. 

Query entailment and answering have also been studied in the context of databases 
with incomplete information (Rosati, 2006b; van der Meyden, 1998; Grahne, 1991). In this 
setting, DLs can be used as schema languages, but the expressivity of the considered DLs 
is much lower than the expressivity of SHXQ. For example, the constructors provided by 
logics of the DL-Lite family (Calvanese, De Giacomo, Lembo, Lenzerini, & Rosati, 2007) 
are chosen such that the standard reasoning tasks are in PTime and query entailment 
is in LogSpace with respect to data complexity. Furthermore, TBox reasoning can be 
done independently of the ABox and the ABox can be stored and accessed using a standard 
database SQL engine. Since the considered DLs are considerable less expressive than SHXQ, 
the techniques used in databases with incomplete information cannot be applied in our 
setting. 

Regarding the query language, it is well known that an extension of conjunctive queries 
with inequalities is undecidable (Calvanese et al., 1998a). Recently, it has further been 
shown that even for DLs with low expressivity, an extension of conjunctive queries with 
inequalities or safe role negation leads to undecidability (Rosati, 2007a). 

A related reasoning problem is query containment. Given a schema (or TBox) S and 
two queries q and q', we have that q is contained in q' w.r.t. S iff every interpretation I 
that satisfies S and q also satisfies q' . It is well known that query containment w.r.t. a 
TBox can be reduced to deciding query entailment for (unions of) conjunctive queries w.r.t. 
a knowledge base (Calvanese et al., 1998a). Hence a decision procedure for (unions of) 
conjunctive queries in SHXQ can also be used for deciding query containment w.r.t. to a 
SHXQ TBox. 

Entailment of unions of conjunctive queries is also closely related to the problem of 
adding rules to a DL knowledge base, e.g., in the form of Datalog rules. Augmenting a 
DL KB with an arbitrary Datalog program easily leads to undecidability (Levy & Rousset, 
1998). In order to ensure decidability, the interaction between the Datalog rules and the 
DL knowledge base is usually restricted by imposing a safeness condition. The DC+log 
framework (Rosati, 2006a) provides the least restrictive integration proposed so far. Rosati 
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presents an algorithm that decides the consistency of a DC+log knowledge base by reducing 
the problem to entailment of unions of conjunctive queries, and he proves that decidability 
of UCQs in SHXQ implies the decidability of consistency for SHlQ+log knowledge bases. 

4. Query Rewriting by Example 

In this section, we motivate the ideas behind our query rewriting technique by means of 
examples. In the following section, we give precise definitions for all rewriting steps. 

4.1 Forest Bases and Canonical Interpretations 

The main idea is that we can focus on models of the knowledge base that have a kind of 
tree or forest shape. It is well known that one reason for Description and Modal Logics 
being so robustly decidable is that they enjoy some form of tree model property, i.e., every 
satisfiable concept has a model that is tree-shaped (Vardi, 1997; Gradel, 2001). When going 
from concept satisfiability to knowledge base consistency, we need to replace the tree model 
property with a form of forest model property, i.e., every consistent KB has a model that 
consists of a set of "trees" , where each root corresponds to a named individual in the ABox. 
The roots can be connected via arbitrary relational structures, induced by the role assertions 
given in the ABox. A forest model is, therefore, not a forest in the graph theoretic sense. 
Furthermore, transitive roles can introduce "short-cut" edges between elements within a 
tree or even between elements of different trees. Hence we talk of "a form of" forest model 
property. 

We now define forest models and show that, for deciding query entailment, we can 
restrict our attention to forest models. The rewriting steps are then used to transform cyclic 
subparts of the query into tree-shaped ones such that there is a "forest-shaped match" for 
the rewritten query into the forest models. 

In order to make the forest model property even clearer, we also introduce forest bases, 
which are interpretations that interpret transitive roles in an unrestricted way, i.e., not 
necessarily in a transitive way. For a forest base, we require in particular that all relation- 
ships between elements of the domain that can be inferred by transitively closing a role are 
omitted. In the following, we assume that the ABox contains at least one individual name, 
i.e., Inds(„4) is non-empty. This is w.l.o.g. since we can always add an assertion T(a) to the 
ABox for a fresh individual name a € Nj. For readers familiar with tableau algorithms, it 
is worth noting that forest bases can also be thought of as those tableaux generated from a 
complete and clash- free completion tree (Horrocks et al., 2000). 

Definition 6. Let N denote the non-negative integers and N* the set of all (finite) words 
over the alphabet IN. A tree T is a non-empty, prefix-closed subset of N*. For w,w' € T, 
we call w' a successor of w if w' = w ■ c for some c € IN", where "•" denotes concatenation. 
We call w' a neighbor of w if w' is a successor of w or vice versa. The empty word e is 
called the root. 

A forest base for JC is an interpretation J = (A^,-^) that interprets transitive roles in 
an unrestricted (i.e., not necessarily transitive) way and, additionally, satisfies the following 
conditions: 

Tl A J C lnds(„4) x N* such that, for all a € lnds(.A), the set {w \ (a,w) G A 1 ?} is a tree; 
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T2 if ((a, w), (a',w')) € , then either w = w' = e or a = a' and w' is a neighbor of w; 

T3 for each a E Inds(^l), cr 7 = (o, e); 

An interpretation X is canonical for K, if there exists a forest base for /C such that X is 
identical to j7 except that, for all non-simple roles r, we have 

r x = r J U (J + 

s S-j^r, sSTrans^ 

In this case, we say that J is a forest base /or X and if X (= K, we say that X is a canonical 
model for /C. A 

For convenience, we extend the notion of successors and neighbors to elements in canon- 
ical models. Let X be a canonical model with (a,w),(a',w') € A 1 . We call (a',w') a 
successor of (a, w) if either a = a' and w' = w ■ c for some c £ N or w = ic' = e. We call 
(a',w') a neighbor of (a,io) if (a',w') is a successor of (a, to) or vice versa. 

Please note that the above definition implicitly relies on the unique name assumption 
(UNA) (cf. T3). This is w.l.o.g. as we can guess an appropriate partition among the in- 
dividual names and replace the individual names in each partition with one representative 
individual name from that partition. In Section 6, we show how the partitioning of individ- 
ual names can be used to simulate the UNA, hence, our decision procedure does not rely 
on the UNA. We also show that this does not affect the complexity. 

Lemma 7. Let K be a STilQ knowledge base and q = q\ V . . . V q n a union of conjunctive 
queries. Then K,\/= q iff there exists a canonical model X of K such that T \/= q. 

A detailed proof is given in the appendix. Informally, for the only if direction, we can 
take an arbitrary counter-model for the query, which exists by assumption, and "unravel" 
all non-tree structures. Since, during the unraveling process, we only replace cycles in the 
model by infinite paths and leave the interpretation of concepts unchanged, the query is 
still not satisfied in the unravelled canonical model. The if direction of the proof is trivial. 



4.2 The Running Example 

We use the following Boolean query and knowledge base as a running example: 

Example 8. Let KL = (T,TZ,A) be a SHXQ knowledge base with r,t G NtR,k € IN 

T = { C k C > k p.T, 
C 3 □ > 3 p.T, 
D 2 E 3s~.Tn3t.T 

} 

n= { tnt~, 

C r 

} 

A = { r (a, 6) , 

(3p'.C fc 'n3p.Cn3r-.C 3 )(a), 
(3p.Xq n3r.D 2 )(b) 

} 

and q = {r(u, x), r(x, y),t(y, y), s(z, y), r(u, z)} with lnds(g) = and Vars(g) = {u, x, y, z}. 
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For simplicity, we choose to use a CQ instead of a UCQ. In case of a UCQ, the rewriting 
steps are applied to each disjunct separately. 




Figure 1: A representation of a canonical interpretation X for K,. 



Figure 1 shows a representation of a canonical model X for the knowledge base fC from 
Example 8. Each labeled node represents an element in the domain, e.g., the individual 
name a is represented by the node labeled (a, e). The edges represent relationships between 
individuals. For example, we can read the r-labeled edge from (a, e) to (b, e) in both 
directions, i.e., (a x ,b x ) = ((a, e), (6, e)) € r x and (b x ,a x ) = ((&, e), (a, e)) G r~ . The 
"short-cuts" due to transitive roles are shown as dashed lines, while the relationship between 
the nodes that represent ABox individuals is shown in grey. Please note that we did not 
indicate the interpretations of all concepts in the figure. 

Since X is a canonical model for /C, the elements of the domain are pairs (a, w), where 
a indicates the individual name that corresponds to the root of the tree, i.e., a x = (a,e) 
and the elements in the second place form a tree according to our definition of trees. For 
each individual name a in our ABox, we can, therefore, easily define the tree rooted in a as 
{w | (a,w) € A 1 }. 




Figure 2: A forest base for the interpretation represented by Figure 1. 



Figure 2 shows a representation of a forest base for the interpretation from Figure 1 
above. For simplicity, the interpretation of concepts is no longer shown. The two trees, 
rooted in (a,s) and (6, e) respectively, are now clear. 

A graphical representation of the query q from Example 8 is shown in Figure 3, where 
the meaning of the nodes and edges is analogous to the ones given for interpretations. We 
call this query a cyclic query since its underlying undirected graph is cyclic (cf. Definition 2). 

Figure 4 shows a match tt for q and X and, although we consider only one canonical 
model here, it is not hard to see that the query is true in each model of the knowledge base, 
i.e., K, (= q. 
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x 




Figure 3: A graph representation of the query from Example 8. 




(b,2) 



(a, 11) *(a,Yl) {a,lk)\ 



(a, 31) • (a, 32) • (a,33) » 



•(b, 21) 



z 



Figure 4: A match n for the query q from Example 8 onto the model I from Figure 1. 

The forest model property is also exploited in the query rewriting process. We want to 
rewrite q into a set of queries q\ , . . . , q n of ground or tree-shaped queries such that K, (= q 
iff JC \= qi V . . . V q n . Since the resulting queries are ground or tree-shaped queries, we can 
explore the known techniques for deciding entailment of these queries. As a first step, we 
transform q into a set of forest-shaped queries. Intuitively, forest-shaped queries consist 
of a set of tree-shaped sub-queries, where the roots of these trees might be arbitrarily 
interconnected (by atoms of the form r(t,t')). A tree-shaped query is a special case of a 
forest-shaped query. We will call the arbitrarily interconnected terms of a forest-shaped 
query the root choice (or, for short, just roots). At the end of the rewriting process, we 
replace the roots with individual names from lnds(.4) and transform the tree parts into 
a concept by applying the so called rolling-up or tuple graph technique (Tessaris, 2001; 
Calvanese et al., 1998a). 

In the proof of the correctness of our procedure, we use the structure of the forest bases 
in order to explicate the transitive "short-cuts" used in the query match. By explicating we 
mean that we replace each role atom that is mapped to such a short-cut with a sequence 
of role atoms such that an extended match for the modified query uses only paths that are 
in the forest base. 

4.3 The Rewriting Steps 

The rewriting process for a query q is a six stage process. At the end of this process, the 
rewritten query may or may not be in a forest shape. As we show later, this "don't know" 
non-determinism does not compromise the correctness of the algorithm. In the first stage, 
we derive a collapsing q co of q by adding (possibly several) equality atoms to q. Consider, 
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for example, the cyclic query q = {r(x, y),r(x, y'), s(y, z), s(y' , z)} (see Figure 5), which can 
be transformed into a tree-shaped one by adding the equality atom y ~ y' . 

x 

s 

z 

Figure 5: A representation of a cyclic query and of the tree-shaped query obtained by adding 
the atom y ~ y' to the query depicted on the left hand side. 

A common property of the next three rewriting steps is that they allow for substituting 
the implicit short-cut edges with explicit paths that induce the short-cut. The three steps 
aim at different cases in which these short-cuts can occur and we describe their goals and 
application now in more detail: 

The second stage is called split rewriting. In a split rewriting we take care of all role 
atoms that are matched to transitive "short-cuts" connecting elements of two different trees 
and by-passing one or both of their roots. We substitute these short-cuts with either one or 
two role atoms such that the roots are included. In our running example, tt maps u to (a, 3) 
and x to (b, e). Hence 1 \= w r(u, x), but the used r-edge is a transitive short-cut connecting 
the tree rooted in a with the tree rooted in b, and by-passing (a,e). Similar arguments hold 
for the atom r(u,z), where the path that implies this short-cut relationship goes via the 
two roots (a, e) and (b,e). It is clear that r must be a non-simple role since, in the forest 
base J for X, there is no "direct" connection between different trees other than between the 
roots of the trees. Hence, (vr(u), vr(x)) € r x holds only because there is a role s € Trans^ 
such that s E^r. In case of our example, r itself is transitive. A split rewriting eliminates 
transitive short-cuts between different trees of a canonical model and adds the "missing" 
variables and role atoms matching the sequence of edges that induce the short-cut. 




Figure 6: A split rewriting q sr for the query shown in Figure 3. 

Figure 6 depicts the split rewriting 

q sr = { r(u, ux),r(ux, x),r(x, y),t(y, y), s(z, y), 
r(u, ux),r(ux, x),r(x, z)} 




y,y 
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of q that is obtained from q by replacing (i) r(u, x) with r(u, ux) and r(ux, x) and (ii) r(u, z) 
with r(u, ux),r(ux, x), and r(x, z). Please note that we both introduced a new variable (ux) 
and re-used an existing variable (x). Figure 7 shows a match for q sr and the canonical model 
X of /C in which the two trees are only connected via the roots. For the rewritten query, we 
also guess a set of roots, which contains the variables that are mapped to the roots in the 
canonical model. For our running example, we guess that the set of roots is {ux,x}. 




z 



Figure 7: A split match ir sr for the query q sr from Figure 6 onto the canonical interpretation 
from Figure 1. 

In the third step, called loop rewriting, we eliminate "loops" for variables v that do not 
correspond to roots by replacing atoms r(v,v) with two atom r(v,v') and r(v',v), where v' 
can either be a new or an existing variable in q. In our running example, we eliminate the 
loop t(y, y) as follows: 

Qir = { r(u, ux),r{ux, x),r{x, y),t(y, y'),t{y' , y),s{z, y), 
r(u, ux),r(ux, x),r(x, z)} 

is the query obtained from q sr (see Figure 6) by replacing t(y, y) with t(y, y') and t(y' , y) for 
a new variable y' . Please note that, since t is defined as transitive and symmetric, t(y,y) 
is still implied, i.e., the loop is also a transitive short-cut. Figure 8 shows the canonical 
interpretation I from Figure 1 with a match iri r for q^ r . The introduction of the new variable 
y' is needed in this case since there is no variable that could be re-used and the individual 
(b, 22) is not in the range of the match 7T sr . 




Figure 8: A loop rewriting qi r and a match for the canonical interpretation from Figure 1. 

The forth rewriting step, called forest rewriting, allows again the replacement of role 
atoms with sets of role atoms. This allows the elimination of cycles that are within a single 
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tree. A forest rewriting qf r for our example can be obtained from q^ r by replacing the role 
atom r(x,z) with r(x,y) and r(y,z), resulting in the query 

qf T = { r(u, ux),r(ux, x),r(x, y),t(y, y'), t(y',y), s(z, y), 
r(u, ux),r(ux, x), r(x, y),r(y, z)}. 

Clearly, this results in tree-shaped sub-queries, one rooted in ux and one rooted in x. 
Hence qf r is forest-shaped w.r.t. the root terms ux and x. Figure 9 shows the canonical 
interpretation I from Figure 1 with a match irf r for qf r . 




Figure 9: A forest rewriting qf r and a forest match TTf r for the canonical interpretation from 
Figure 1. 

In the fifth step, we use the standard rolling-up technique (Horrocks & Tessaris, 2000; 
Calvanese et al., 1998a) and express the tree-shaped sub-queries as concepts. In order to 
do this, we traverse each tree in a bottom-up fashion and replace each leaf (labeled with a 
concept C, say) and its incoming edge (labeled with a role r, say) with the concept 3r.C 
added to its predecessor. For example, the tree rooted in ux (i.e., the role atom r(u,ux)) 
can be replaced with the atom (3r~ .T)(ux). Similarly, the tree rooted in x (i.e., the role 
atoms r(x,y),r(y, z),s(z,y),t(y,y'), and t(y',y)) can be replaced with the atom 

(3r.((3(r n lnv(s)).T) n (3(t n lnv(i)).T))(x). 

Please note that we have to use role conjunctions in the resulting query in order to capture 
the semantics of multiple role atoms relating the same pair of variables. 

Recall that, in the split rewriting, we have guessed that x and ux correspond to roots and, 
therefore, correspond to individual names in lnds(^4). In the sixth and last rewriting step, 
we guess which variable corresponds to which individual name and replace the variables with 
the guessed names. A possible guess for our running example would be that ux corresponds 
to a and x to b. This results in the (ground) query 

{(3r-.T)(o),r(a,6), (3r.((3(r n lnv(*)).T) n (3(t n lnv(t)).T)))(6)}, 

which is entailed by /C. 

Please note that we focused in the running example on the most reasonable rewriting. 
There are several other possible rewritings, e.g., we obtain another rewriting from qf r by 
replacing ux with b and x with a in the last step. For a UCQ, we apply the rewriting steps 
to each of the disjuncts separately. 
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At the end of the rewriting process, we have, for each disjunct, a set of ground queries 
and / or queries that were rolled-up into a single concept atom. The latter queries result from 
forest rewritings that are tree-shaped and have an empty set of roots. Such tree-shaped 
rewritings can match anywhere in a tree and can, thus, not be grounded. Finally, we check 
if our knowledge base entails the disjunction of all the rewritten queries. We show that 
there is a bound on the number of (forest-shaped) rewritings and hence on the number of 
queries produced in the rewriting process. 

Summing up, the rewriting process for a connected conjunctive query q involves the 
following steps: 

1. Build all collapsings of q. 

2. Build all split rewritings of each collapsing w.r.t. a subset R of roots. 

3. Build all loop rewritings of the split rewritings. 

4. Build all (forest-shaped) forest rewritings of the loop rewritings. 

5. Roll up each tree-shaped sub-query in a forest-rewriting into a concept atom and 

6. replace the roots in R with individual names from the ABox in all possible ways. 

Let q±, . . . , q n be the queries resulting from the rewriting process. In the next section, we 
define each rewriting step and prove that JC \= q iff K, \= q\V ■ ■ - Vq n . Checking entailment for 
the rewritten queries can easily be reduced to KB consistency and any decision procedure 
for SHlQ n KB consistency could be used in order to decide if K |= q. We present one such 
decision procedure in Section 6. 

5. Query Rewriting 

In the previous section, we have used several terms, e.g., tree- or forest-shaped query, 
rather informally. In the following, we give definitions for the terms used in the query 
rewriting process. Once this is done, we formalize the query rewriting steps and prove the 
correctness of the procedure, i.e., we show that the forest-shaped queries obtained in the 
rewriting process can indeed be used for deciding whether a knowledge base entails the 
original query. We do not give the detailed proofs here, but rather some intuitions behind 
the proofs. Proofs in full detail are given in the appendix. 

5.1 Tree- and Forest-Shaped Queries 

In order to define tree- or forest-shaped queries more precisely, we use mappings between 
queries and trees or forests. Instead of mapping equivalence classes of terms by A to nodes 
in a tree, we extend some well-known properties of functions as follows: 

Definition 9. For a mapping /: A — > B, we use dom(/) and ran(/) to denote /'s domain 
A and range B, respectively. Given an equivalence relation ?» on dom(/), we say that / is 
injective modulo A if, for all a, a' € dom(/), f{a) = f{a') implies a&a' and we say that / 
is bijective modulo pS if jT is injective modulo ~ and surjective. Let q be a query. A tree 
mapping for q is a total function / from terms in q to a tree such that 
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1. /is bijective modulo ~, 

2. if r(t,t') G g, then f(t) is a neighbor of f(t'), and, 

3. if a G Inds(g), then f(a) = e. 

The query q is tree-shaped if tl(lnds(g)) < 1 and there is a tree mapping for q. 

A root choice R for q is a subset of Terms(g) such that lnds(g) C R and, if t G -R and 
iwi', then i' G ii. For t £ R, we use Reach(i) to denote the set of terms t' G Terms(g) for 
which there exists a sequence of terms t\, . . . , t n € Terms(g) such that 

1. t\ = t and t n = t' , 

2. for all 1 < i < n, there is a role r such that r{ti,ti + \) G g, and, 

3. for 1 < i < n, if tj G i2, then 

We call R a rooi splitting w.r.t. q if either = or if, for ti,tj E R, U j& tj implies that 
Reach (tj) n Reach (tj) = 0. Each term t G R induces a sub-query 

subq(g,t) := {ai G g | the terms in at occur in Reach(t)}\ 
{r(t,t) \ r (t,t)eq}. 

A query g is forest-shaped w.r.t. a root splitting R if either R = and g is tree-shaped or 
each sub-query subq(g,t) for t £ R is tree-shaped. A 

For each term t G R, we collect the terms that are reachable from f in the set Reach (t). 
By Condition 3, we make sure that R and ~ are such that each t' G Reach (t) is either not in 
R or t~t'. Since queries are connected by assumption, we would otherwise collect all terms 
in Reach (t) and not just those t' £ R. For a root splitting, we require that the resulting sets 
are mutually disjoint for all terms t,t f G R that are not equivalent. This guarantees that all 
paths between the sub-queries go via the root nodes of their respective trees. Intuitively, a 
forest-shaped query is one that can potentially be mapped onto a canonical interpretation 
I = (A x ,- X ) such that the terms in the root splitting R correspond to roots (a,e) G A x . 
In the definition of subq(g,t), we exclude loops of the form r(t,t) G g, as these parts of 
the query are grounded later in the query rewriting process and between ground terms, we 
allow arbitrary relationships. 

Consider, for example, the query q sr of our running example from the previous section 
(cf. Figure 6). Let us again make the root choice R := {ux,x} for g. The sets Reach(ux) 
and Reach(x) w.r.t. q sr and R are {ux,u} and {x,y,z} respectively. Since both sets are 
disjoint, ii is a root splitting w.r.t. Qsr- If W6 choose, however, R \ — {x, tli6 set R is not 
a root splitting w.r.t. q sr since Reach(x) — \ux^u^ and Reach(y) — &re not disjoint. 

5.2 From Graphs to Forests 

We are now ready to define the query rewriting steps. Given an arbitrary query, we ex- 
haustively apply the rewriting steps and show that we can use the resulting queries that are 
forest-shaped for deciding entailment of the original query. Please note that the following 
definitions are for conjunctive queries and not for unions of conjunctive queries since we 
apply the rewriting steps for each disjunct separately. 
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Definition 10. Let q be a Boolean conjunctive query. A collapsing q co of q is obtained by 
adding zero or more equality atoms of the form t « t' for t, t' £ Terms((/) to q. We use co(q) 
to denote the set of all queries that are a collapsing of q. 

Let /C be a SHZQ knowledge base. A query q sr is called a split rewriting of q w.r.t. /C 
if it is obtained from q by choosing, for each atom r(t,t') £ q, to either: 

1. do nothing, 

2. choose a role s £ Trans^ such that s d^r and replace r(t,t') with s(t,u), s(u,t'), or 

3. choose a role s € Trans^ such that s E^r and replace r(t,t') with s(t,u), s(u,u'), 
s(u',t'), 

where u, u' £ Ay are possibly fresh variables. We use sr/c(q) to denote the set of all pairs 
(q sr , R) for which there is a query q co £ co(q) such that q sr is a split rewriting of f/ co and R 
is a root splitting w.r.t. q sr . 

A query qi r is called a Zoop rewriting of g w.r.t. a root splitting R and /C if it is obtained 
from q by choosing, for all atoms of the form r(t, t) € q with t ^ R, a role s € Trans^ such 
that s EI-£.r and by replacing r(t, t) with two atoms s(t, t') and t) for t! £ Ay a possibly 
fresh variable. We use lr*;(g) to denote the set of all pairs (q£ r , R) for which there is a tuple 
(q sr ,R) £ sric(q) such that q% r is a loop rewriting of q sr w.r.t. i? and /C. 

For a forest rewriting, fix a set V C Ny of variables not occurring in q such that 
(lOO — tt(Vars(g)). A forest rewriting qj r w.r.t. a root splitting R oi q and /C is obtained 
from g by choosing, for each role atom r(t,t') such that either i? = and r(t,t') £ q or 
there is some i r £ i? and r(t,t') £ subq(q,t r ) to either 

1. do nothing, or 

2. choose a role s £ Trans^ such that s ^Lur and replace r(t,t') with £ < f|(Vars((/)) role 
atoms s(ti, £2)) • ■ ■ ? s(t£, t^+i), where ti = t, tg + i = t' , and £2, . . . , t% £ Vars(g) U V. 

We use fr/e((?) to denote the set of all pairs (qf r , R) for which there is a tuple (g&., i?) £ lr^(g) 
such that gj r is a forest-shaped forest rewriting of qi r w.r.t. R and /C. A 

If /C is clear from the context, we say that q' is a split, loop, or forest rewriting of 
q instead of saying that q' is a split, loop, or forest rewriting of q w.r.t. K. We assume 
that sr)c(q),\ric(q), and fric(q) contain no isomorphic queries, i.e., differences in (newly 
introduced) variable names only are neglected. 

In the next section, we show how we can build a disjunction of conjunctive queries 
qi V • • • V qt from the queries in frx:(g) such that each qi for 1 < i < I is either of the form 
C(v) for a single variable v £ Vars(gj) or qi is ground, i.e., qi contains only constants and 
no variables. It then remains to show that K, \= q iff K, \= q\ V • • • V qt. 

5.3 From Trees to Concepts 

In order to transform a tree-shaped query into a single concept atom and a forest-shaped 
query into a ground query, we define a mapping / from the terms in each tree-shaped sub- 
query to a tree. We then incrementally build a concept that corresponds to the tree-shaped 
query by traversing the tree in a bottom-up fashion, i.e., from the leaves upwards to the 
root. 
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Definition 11. Let q be a tree-shaped query with at most one individual name. If a € 
lnds(q), then let t r = a otherwise let t r = v for some variable v € Vars(g). Let / be a tree 
mapping such that f(t r ) = e. We now inductively assign, to each term t € Terms(g), a 
concept con(g, t) as follows: 

• if f(t) is a leaf of ran(/), then con (q,t) := \~\c{t)eqC, 

• if f(t) has successors f(t\), . . . , f(tk), then 

con(g,t):= [} cmq CU 

V\i<i<k 3 ( llr^eg r ) - con (?' 

Finally, the query concept of q w.r.t. t r is con(q,t r ). A 

Please note that the above definition takes equality atoms into account. This is because 
the function / is bijective modulo fS and, in case there are concept atoms C(t) and C(t') 
for t&t', both concepts are conjoined in the query concept due to the use of the relation S. 
Similar arguments can be applied to the role atoms. 

The following lemma shows that query concepts indeed capture the semantics of q. 

Lemma 12. Let q be a tree-shaped query with t r € Terms(g) as defined above, C q = 
con(q,t r ), and I an interpretation. Then Z \= q iff there is a match ir and an element 
d € C q x such that ir(t r ) = d. 

The proof given by Horrocks, Sattler, Tessaris, and Tobies (1999) easily transfers from 
VCR, to SHIQ. By applying the result from the above lemma, we can now transform a 
forest-shaped query into a ground query as follows: 

Definition 13. Let (qj r ,R) G fric(q) for R ^ 0, and r: R — > Inds(y4) a total function such 
that, for each a € lnds(g), r(a) = a and, for t,t' E R, r(t) = r(t') iff t&t' . We call such a 
mapping r a ground mapping for R w.r.t. A. We obtain a ground query ground(qyy, R, r) 
of qf r w.r.t. the root splitting R and ground mapping r as follows: 

• replace each t G R with r(t), and, 

• for each a £ ran(r), replace the sub-query q a = subq(qj r , a) with con(q a ,a). 

We define the set ground^ (q) of ground queries for q w.r.t. 1C as follows: 

ground^q) := {q 1 \ there exists some (qf r ,R) £ fr^(g) with R ^ 
and some ground mapping r w.r.t. A and R 
such that q 1 = ground (qf r , R, r)} 

We define the set of trees^(q) of tree queries for q as follows: 

treesfc(g) := {<?' | there exists some (<?/ r ,0) € fr/c^) and 

w € Vars(gyy) such that g' = (con(gj r , v))(v )} a 
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Going back to our running example, we have already seen that (<?/y, {ux, x}) belongs to 
the set frjc(q) for 

Qfr = { r ( u , ux),r(ux, x),r(x, y),t(y, y'),t(y', y), s(z, y),r(y, z)}. 

There are also several other queries in the set fr^(g), e.g., (q, {u, x,y, z}), where q is the 
original query and the root splitting R is such that R = Terms(q), i.e., all terms are in the 
root choice for q. In order to build the set groundy C (g), we now build all possible ground 
mappings r for the set lnds(„4) of individual names in our ABox and the root splittings for 
the queries in fr^(g). The tuple (qf r , {ux, x}) € fr>c(q) contributes two ground queries for 
the set ground^ (<?): 

ground(g^ r , {ux, x}, {ux 4o,i4 b}) = 

{r(a, b), (3lnv(r).T)(o), (3r.((3(r n lnv(a)).T) n (3(t n lnv(t)).T)))(6)}, 

where 3lnv(r).T is the query concept for the (tree-shaped) sub-query subq(gj r , ux) and 
3r.((3(r l~l lnv(s)).T) l~l (3(t n lnv(i)).T) is the query concept for subq(g/y, x) and 

ground(gj r , {ux, x}, {ux \-> b, x 1-4 a}) = 

{r(b,a), (3lnv(r).T)(6), (3r.((3(r n lnv(s)).T) n (3(t n lnv(t)).T)))(a)}. 

The tuple (q, {u, x, y, z}) E ffjc(q), however, does not contribute a ground query since, for 
a ground mapping, we require that r(i) = r(i') iff t&t 1 and there are only two individual 
names in lnds(*4) compared to four terms q that need a distinct value. Intuitively, this is 
not a restriction, since in the first rewriting step (collapsing) we produce all those queries 
in which the terms of q have been identified with each other in all possible ways. In our 
example, K \= q and K \= q\ V ■ ■ ■ V qn, where q\ V ■ ■ ■ V qn are the queries from trees^(g) and 
ground^(g) since each model X of /C satisfies qi = ground(g^ r , {ux, x}, {ux 4a,i4 &}). 

5.4 Query Matches 

Even if a query is true in a canonical model, it does not necessarily mean that the query 
is tree- or forest-shaped. However, a match tt for a canonical interpretation can guide the 
process of rewriting a query. Similarly to the definition of tree- or forest-shaped queries, we 
define the shape of matches for a query. In particular, we introduce three different kinds 
of matches: split matches, forest matches, and tree matches such that every tree match is 
a forest match, and every forest match is a split match. The correspondence to the query 
shapes is as follows: given a split match tt, the set of all root nodes (a, e) in the range 
of the match define a root splitting for the query, if tt is additionally a forest match, the 
query is forest-shaped w.r.t. the root splitting induced by tt, and if tt is additionally a tree 
match, then the whole query can be mapped to a single tree (i.e., the query is tree-shaped 
or forest-shaped w.r.t. an empty root splitting). Given an arbitrary query match into a 
canonical model, we can first obtain a split match and then a tree or forest match, by using 
the structure of the canonical model for guiding the application of the rewriting steps. 

Definition 14. Let K, be a SHXQ knowledge base, q a query, X = (A x ,- X ) a canonical 
model of /C, and tt: Terms(g) -4 A x an evaluation such that X \= n q. We call tt a split match 
if, for all r{t, t') G q, one of the following holds: 
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1. 7r(i) = (a, e) and 7r(t') = (b, e) for some a, b G lnds(^4); or 

2. 7r(£) = (a, w) and 7r(i') = (a,w') for some a € lnds(.A) and w,w' G N*. 

We call 7r a forest match if, additionally, for each term t r G Terms(g) with 7r(i r ) = (a, e) 
and a G Inds(^l), there is a total and bijective mapping / from {(a,w) \ (a,w) G ran(7r)} to 
a tree T such that r(t,t') G subq(q,t r ) implies that f(n(t)) is a neighbor of f(n(t')). We 
call 7r a tree match if, additionally, there is an a G lnds(„4) such that each element in ran(7r) 
is of the form (a, w). 

A split match tt for a canonical interpretation induces a (possibly empty) root splitting 
R such that t G R iff 7r(i) = (a, e) for some a G lnds(^4). We call R the root splitting induced 
by tt. A 

For two elements (a, w) and (a, u/) in a canonical model, the path from (a, iu) to (a, w') 
is the sequence (a,u;i), . . . , (a,w n ) where w = w\,w' = w n , and, for 1 < i < n,Wi + \ is a 
successor of Wi. The length of the path is n. Please note that, for a forest match, we do 
not require that w is a neighbor of w' or vice versa. This still allows to map role atoms to 
paths in the canonical model of length greater than two, but such paths must be between 
ancestors and not between elements in different branches of the tree. The mapping / to a 
tree also makes sure that if R is the induced root splitting, then each sub-query subq(g, t) 
for t G R is tree-shaped. For a tree match, the root splitting is either empty or t&t' for 
each t,t' G R, i.e., there is a single root modulo A, and the whole query is tree-shaped. 

5.5 Correctness of the Query Rewriting 

The following lemmas state the correctness of the rewriting step by step for each of the 
rewriting stages. Full proofs are given in the appendix. As motivated in the previous 
section, we can use a given canonical model to guide the rewriting process such that we 
obtain a forest-shaped query that also has a match into the model. 

Lemma 15. Let X be a model for IC. 

1. If Z \= q, then there is a collapsing q co of q such that X ^ 7Tco q co for ir co an injection 
modulo 

2. If X \= nco q co for a collapsing q co of q, then I \= q. 

Given a model X that satisfies q, we can simply add equality atoms for all pairs of terms 
that are mapped to the same element in X. It is not hard to see that this results in a 
mapping that is injective modulo f». For the second part, it is easy to see that a model that 
satisfies a collapsing also satisfies the original query. 

Lemma 16. Let X be a model for IC. 

1. If X is canonical and X \= w q, then there is a pair (q sr ,R) G sr^(g) and a split match 
ir sr such that X \= 7r3r q sr; R is the induced root splitting of n sr , and ir sr is an injection 
modulo !=S. 

2. If (q sr , R) G srjc(q) and X \= Wisr q sr for some match ir sr , then I \= q. 
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For the first part of the lemma, we proceed exactly as illustrated in the example section 
and use the canonical model X and the match ir to guide the rewriting steps. We first build 
a collapsing q co G co(q) as described in the proof of Lemma 15 such that X \= nco q co for ir co 
an injection modulo s5s. Since X is canonical, paths between different trees can only occur 
due to non-simple roles, and thus we can replace each role atom that uses such a short-cut 
with two or three role atoms such that these roots are explicitly included in the query (cf. 
the query and match in Figure 4 and the obtained split rewriting and with a split match 
in Figure 7). The second part of the lemma follows immediately from the fact that we use 
only transitive sub-roles in the replacement. 

Lemma 17. Let X be a model of /C. 

1. If X is canonical and X \= q, then there is a pair (q^ r ,R) G lr^(g) and a mapping iT( r 
such that X |= 7r ^ r qi r , TT£ r is an injection modulo R is the root splitting induced by 
TT£ r and, for each r(t,t) G qi T , t G R. 

2. If (qe r ,R) G and X q ir for some match irg r , then I \= q. 

The second part is again straightforward, given that we can only use transitive sub-roles 
in the loop rewriting. For the first part, we proceed again as described in the examples 
section and use the canonical model X and the match ir to guide the rewriting process. We 
first build a split rewriting q sr and its root splitting R as described in the proof of Lemma 16 
such that (q sr ,R) G sr^(g) and X |= 7rsr q sr for a split match ir sr . Since X is a canonical 
model, it has a forest base J . In a forest base, non-root nodes cannot be successors of 
themselves, so each such loop is a short-cut due to some transitive role. An element that 
is, say, r-related to itself has, therefore, a neighbor that is both an r- and lnv(r)-successor. 
Depending on whether this neighbor is already in the range of the match, we can either 
re-use an existing variable or introduce a new one, when making this path explicit (cf. the 
loop rewriting depicted in Figure 8 obtained from the split rewriting shown in Figure 7). 

Lemma 18. Let X be a model of /C. 

1. IfX is canonical andX \= q, then there is a pair (qf r , R) G fr^(^) such that X \= n t' r q^ r 
for a forest match iTf r , R is the induced root splitting of iTf r , and 7Ty> is an injection 
modulo ~. 

2. If (qfr, R) G fr^(q) and X \= n f r qj- r for some match TTf r , then X (= q. 

The main challenge is again the proof of (1) and we just give a short idea of it here. 
At this point, we know from Lemma 17 that we can use a query qi r for which there is a 
root splitting R and a split match is^. Since ir£ r is a split match, the match for each such 
sub-query is restricted to a tree and thus we can transform each sub-query of qg r induced 
by a term t in the root choice separately. The following example is meant to illustrate why 
the given bound of §(\/ars(q)) on the number of new variables and role atoms that can be 
introduced in a forest rewriting suffices. Figure 10 depicts the representation of a tree from 
a canonical model, where we use only the second part of the names for the elements, e.g., 
we use just e instead of (a, e). For simplicity, we also do not indicate the concepts and 
roles that label the nodes and edges, respectively. We use black color to indicate the nodes 
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and edges that are used in the match for a query and dashed lines for short-cuts due to 
transitive roles. In the example, the grey edges are also those that belong to the forest base 
and the query match uses only short-cuts. 



Figure 10: A part of a representation of a canonical model, where the black nodes and 
edges are used in a match for a query and dashed edges indicate short-cuts due 
to transitive roles. 



The forest rewriting aims at making the short-cuts more explicit by replacing them with 
as few edges as necessary to obtain a tree match. In order to do this, we need to include 
the "common ancestors" in the forest base between each two nodes used in the match. For 
w, w 1 E IN*, we therefore define the longest common prefix (LCP) of w and w' as the longest 
w € N* such that w is a prefix of both w and w'. For a forest rewriting, we now determine 
the LCPs of any two nodes in the range of the match and add a variable for those LCPs 
that are not yet in the range of the match to the set V of new variables used in the forest 
rewriting. In the example from Figure 10 the set V contains a single variable v\ for the 
node 1. 

We now explicate the short-cuts as follows: for any edge used in the match, e.g., the 
edge from e to 111 in the example, we define its path as the sequence of elements on the 
path in the forest base, e.g., the path for the edge from e to 111 is e, 1, 11, 111. The relevant 
path is obtained by dropping all elements from the path that are not in the range of the 
mapping or correspond to a variable in the set V, resulting in a relevant path of e, 1, 111 
for the example. We now replace the role atom that was matched to the edge from e to 111 
with two role atoms such that the match uses the edge from e to 1 and from 1 to 111. An 
appropriate transitive sub-role exists since otherwise there could not be a short-cut. Similar 
arguments can be used to replace the role atom mapped to the edge from 111 to 12 and 
for the one that is mapped to the edge from e to 12, resulting in a match as represented 
by Figure 11. The given restriction on the cardinality of the set V is no limitation since 
the number of LCPs in the set V is maximal if there is no pair of nodes such that one is 
an ancestor of the other. We can see these nodes as n leaf nodes of a tree that is at least 
binarily branching. Since such a tree can have at most n inner nodes, we need at most n 
new variables for a query in n variables. 
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Figure 11: The match for a forest rewriting obtained from the example given in Figure 10. 



For the bound on the number of role atoms that can be used in the replacement of a 
single role atom, consider, for example, the cyclic query 

q = {r(x 1 ,X2),r(x2,x 3 ),r(x3,Xi),t(x 1 ,Xi)}, 

for the knowledge base K, = (T,1Z,A) with T = ®,1Z = {r C. t} with t € Trans^ and 
A = {(3r.(3r.(3r.T)))(a)}. It is not hard to check that K \= q. Similarly to our running 
example from the previous section, there is also a single rewriting that is true in each 
canonical model of the KB, which is obtained by building only a forest rewriting and doing 
nothing in the other rewriting steps, except for choosing the empty set as root splitting in 
the split rewriting step. In the forest rewriting, we can explicate the short-cut used in the 
mapping for t(xi,x&) by replacing t{x\,x^) with t(x\, X2), t(x2, 23), t{x%, 24). 

By using Lemmas 15 to 18, we get the following theorem, which shows that we can use 
the ground queries in ground^ (q) and the queries in trees/c(q) in order to check whether K, 
entails q, which is a well understood problem. 

Theorem 19. Let IC be a SHIQ knowledge base, q a Boolean conjunctive query, and 
{qi,.. .,qi} = trees/c(g) Uground^(g). Then K. \= q iff K. \= q x V . . . V q t . 

We now give upper bounds on the size and number of queries in trees/c(g) and ground^(g). 
As before, we use jj(S') to denote the cardinality of a set S. The size \K\ (\q\) of a knowledge 
base K, (a query q) is simply the number of symbols needed to write it over the alphabet 
of constructors, concept names, and role names that occur in K, (q), where numbers are 
encoded in binary. Obviously, the number of atoms in a query is bounded by its size, hence 
tt(?) < \q\ and, for simplicity, we use n as the size and the cardinality of q in what follows. 

Lemma 20. Let q be a Boolean conjunctive query, IC = (T, 7Z, A) a SHXQ knowledge base, 
\q\ := n and \1C\ := m. Then there is a polynomial p such that 

1. tt(co(<z)) < 2 p (") and, for each q' € co(q), \q'\ < p(n), 

2. %(sr K (q)) < 2fW lo SP( m ) ; and, for each q' G sr^q), \q'\ < p(n), 

3. HOr/cOz)) < 2P( n )- lo SP( m ), and, for each q' G \r K (q), \q'\ < p{n), 
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I (KM?)) < 2PW' lo gP( m ), and, for each q' G h K (q), \q'\ < p(n), 

5. D(trees^(g)) < 2P ( - n > lo £P ( - m \ and, for each q' G trees K (q), \q'\ <p(n), and 

6. tj(groundy C (g)) < 2 p ( n )' logp ( m ) ; and, for each q' G ground^(g), \q'\ < p{n). 

As a consequence of the above lemma, there is a bound on the number of queries in 
ground^g) and treesAc(g) and it is not hard to see that the two sets can be computed in 
time polynomial in m and exponential in n. 

In the next section, we present an algorithm that decides entailment of unions of con- 
junctive queries, where each of the queries is either a ground query or consists of a single 
concept atom C(x) for an existentially quantified variable x. By Theorem 19 and Lemma 20, 
such an algorithm is a decision procedure for arbitrary unions of conjunctive queries. 

5.6 Summary and Discussion 

In this section, we have presented the main technical foundations for answering (unions 
of) conjunctive queries. It is known that queries that contain non-simple roles in cycles 
among existentially quantified variables are difficult to handle. By applying the rewriting 
steps from Definition 10, we can rewrite such cyclic conjunctive queries into a set of acyclic 
and/or ground queries. Both types of queries are easier to handle and algorithms for both 
types exist. At this point, any reasoning algorithm for ST-LIQ n knowledge base consistency 
can be used for deciding query entailment. In order to obtain tight complexity results, we 
present in the following section a decision procedure that is based on an extension of the 
translation to looping tree automata given by Tobies (2001). 

It is worth mentioning that, for queries with only simple roles, our algorithm behaves 
exactly as the existing rewriting algorithms (i.e., the rolling-up and tuple graph technique) 
since, in this case, only the collapsing step is applicable. The need for identifying variables 
was first pointed out in the work of Horrocks et al. (1999) and is also required (although 
not mentioned) for the algorithm proposed by Calvanese et al. (1998a). 

The new rewriting steps (split, loop, and forest rewriting) are only required for and 
applicable to non-simple roles and, when replacing a role atom, only transitive sub-roles of 
the replaced role can be used. Hence the number of resulting queries is in fact not determined 
by the size of the whole knowledge base, but by the number of transitive sub-roles for the 
non-simple roles in the query. Therefore, the number of resulting queries really depends on 
the number of transitive roles and the depth of the role hierarchy for the non-simple roles 
in the query, which can, usually, expected to be small. 

6. The Decision Procedure 

We now devise a decision procedure for entailment of unions of Boolean conjunctive queries 
that uses, for each disjunct, the queries obtained in the rewriting process as defined in the 
previous section. Detailed proofs for the lemmas and theorems in this section can again be 
found in the appendix. For a knowledge base K, and a union of Boolean conjunctive queries 
qi V . . . V qi, we show how we can use the queries in trees/c(<?j) and ground^ (qi) for 1 < i < £ 
in order to build a set of knowledge bases /Ci , . . . , fC n such that /C |= q\ V . . . V qe iff all the 
JCi are inconsistent. This gives rise to two decision procedures: a deterministic one in which 
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we enumerate all /Q, and which we use to derive a tight upper bound for the combined 
complexity; and a non-deterministic one in which we guess a /Q, and which yields a tight 
upper bound for the data complexity. Recall that, for combined complexity, the knowledge 
base /C and the queries both count as input, whereas for the data complexity only the 
ABox A counts as an input, and all other parts are assumed to be fixed. 

6.1 A Deterministic Decision Procedure for Query Entailment in SHXQ 

We first define the deterministic version of the decision procedure and give an upper bound 
for its combined complexity. The given algorithm takes as input a union of connected 
conjunctive queries and works under the unique name assumption (UNA). We show after- 
wards how it can be extended to an algorithm that does not make the UNA and that takes 
arbitrary UCQs as input, and that the complexity results carry over. 

We construct a set of knowledge bases that extend the original knowledge base /C both 
w.r.t. the TBox and ABox. The extended knowledge bases are such that a given KB /C 
entails a query q iff all the extended KBs are inconsistent. We handle the concepts obtained 
from the tree-shaped queries differently to the ground queries: the axioms we add to the 
TBox prevent matches for the tree-shaped queries, whereas the extended ABoxes contain 
assertions that prevent matches for the ground queries. 

Definition 21. Let 1C = (T, TZ, A) be a SV.XQ knowledge base and q = q\ V . . . V qi a union 
of Boolean conjunctive queries. We set 

1. T := trees*; (ft.) U . . . U trees/cfe), 

2. G := groundy C (gi) U . . . U ground^ (q^), and 

3. T q := {T C -.C | C(v) € T}. 

An extended knowledge base K q w.r.t. K, and q is a tuple (T U T q , 1Z, A U A q ) such that A q 
contains, for each q' £ G, at least one assertion —>at with at € q'. A 

Informally, the extended TBox T U T q ensures that there are no tree matches. Each 
extended ABox AL)A q contains, for each ground query q' obtained in the rewriting process, 
at least one assertion —>at with at € q' that "spoils" a match for q'. A model for such an 
extended ABox can, therefore, not satisfy any of the ground queries. If there is a model for 
any of the extended knowledge bases, we know that this is a counter-model for the original 
query. 

We can now use the extended knowledge bases in order to define the deterministic 
version of our algorithm for deciding entailment of unions of Boolean conjunctive queries in 
SHXQ. 

Definition 22. Given a SHXQ knowledge base IC = (T, 1Z, A) and a union of connected 
Boolean conjunctive queries q as input, the algorithm answers '7C entails q" if each extended 
knowledge base w.r.t. /C and q is inconsistent and it answers "JC does not entail q" otherwise. 

A 

The following lemma shows that the above described algorithm is indeed correct. 
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Lemma 23. Let K, be a SrlXQ knowledge base and q a union of connected Boolean con- 
junctive queries. Given K. and q as input, the algorithm from Definition 22 answers "K 
entails q" iff K, \= q under the unique name assumption. 

In the proof of the if direction for the above lemma, we can use a canonical model X 
of K, in order to guide the rewriting process. For the only if direction, we assume to the 
contrary of what is to be shown that there is no consistent extended knowledge base, but 
/C ft= q. We then use a model X of /C such that X ft= q, which exists by assumption, and 
show that I is also a model of some extended knowledge base. 

6.1.1 Combined Complexity of Query Entailment in ST-LXQ 

According to the above lemma, the algorithm given in Definition 22 is correct. We now 
analyse its combined complexity and thereby prove that it is also terminating. 

For the complexity analysis, we assume, as usual (Hustadt et al., 2005; Calvanese, 
De Giacomo, Lembo, Lenzerini, & Rosati, 2006; Ortiz et al., 2006b), that all concepts in 
concept atoms and ABox assertions are literals, i.e., concept names or negated concept 
names. If the input query or ABox contains non-literal atoms or assertions, we can easily 
transform these into literal ones in a truth preserving way: for each concept atom C(t) in 
the query where C is a non- literal concept, we introduce a new atomic concept Ac £ Nq, 
add the axiom C C Ac to the TBox, and replace C(t) with Ac(t); for each non-literal 
concept assertion C(a) in the ABox, we introduce a new atomic concept Ac € Nc, add 
an axiom Ac E C to the TBox, and replace C(a) with Ac(a). Such a transformation is 
obviously polynomial, so without loss of generality, it is safe to assume that the ABox and 
query contain only literal concepts. This has the advantage that the size of each atom and 
ABox assertion is constant. 

Since our algorithm involves checking the consistency of a STiXQ n knowledge base, 
we analyse the complexity of this reasoning service. Tobies (2001) shows an ExpTime 
upper bound for deciding the consistency of STiXQ knowledge bases (even with binary 
coding of numbers) by translating a ST-LXQ KB to an equisatisfiable ACCQXb knowledge 
base. The b stands for safe Boolean role expressions built from ACCQXb roles using the 
operator n (role intersection), U (role union), and -> (role negation/complement) such that, 
when transformed into disjunctive normal form, every disjunct contains at least one non- 
negated conjunct. Given a query q and a SrLXQ knowledge base K, = (T,TZ,A), we reduce 
query entailment to deciding knowledge base consistency of an extended STiXQj 1 knowledge 
base fCg = (T U T q , TZ, A U A q ). Recall that T q and A q are the only parts that contain 
role conjunctions and that we use role negation only in ABox assertions. We extend the 
translation given for SrLXQ so that it can be used for deciding the consistency of STiXQ n 
KBs. Although the translation works for all ST-LXQ n KBs, we assume the input KB to be 
of exactly the form of extended knowledge bases as described above. This is so because the 
translation for unrestricted STiXQ n is no longer polynomial, as in the case of SrLXQ, but 
exponential in the size of the longest role conjunction under a universal quantifier. Since 
role conjunctions occur only in the extended ABox and TBox, and since the size of each role 
conjunction is, by Lemma 20, polynomial in the size of q, the translation is only exponential 
in the size of the query in the case of extended knowledge bases. 
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We assume here, as usual, that all concepts are in negation normal form (NNF); any 
concept can be transformed in linear time into an equivalent one in NNF by pushing negation 
inwards, making use of de Morgan's laws and the duality between existential and universal 
restrictions, and between atmost and atleast number restrictions n r.C and n r.C 
respectively) (Horrocks et al., 2000). For a concept C, we use —*C to denote the NNF of 
-.C. 

We define the closure c\(C,TZ) of a concept C w.r.t. a role hierarchy TZ as the smallest 
set satisfying the following conditions: 

• if D is a sub-concept of C, then D G cl(C, TZ), 

• if D 6 c\(C,K), then £ c\(C,K), 

• if Vr.D G c\(C,1l),s E^r, and s G Trans^, then Vs.D G cl(C,7£). 

We now show how we can extend the translation from STiXQ to ACCQIb given by 
Tobies. We first consider <S'HXQ n -concepts and then extend the translation to KBs. 

Definition 24. For a role hierarchy TZ and roles r, n, . . . , r n , let 

t(r,^)= |~| s and t(?"i n . . . n r n , TZ) =f(ri, TZ) n . . . n t(r n , 71). 

Please note that, since r E^r, r occurs in f(r, TZ). 

Lemma 25. Lei TZ be a role hierarchy, and ri,...,r n roles. For every interpretation I 
such that 1\=TZ, it holds that (t(n n . . . n r n , TZ)) 1 = (n n . . . n r n ) x . 

With the extended definition of t on role conjunctions, we can now adapt the definition 
(Def. 6.22) that Tobies provides for translating tS%ZQ-concepts into ACC QX6-concepts. 

Definition 26. Let C be a 57££<2 n -concept in NNF and TZ a role hierarchy. For every 
concept V(ri n . . . n r n ).D G c\(C,TZ), let X ri n...r\r n ,D € Nc be a unique concept name that 
does not occur in cl(C, TZ). Given a role hierarchy TZ, we define the function tr inductively 
on the structure of concepts by setting 

tr{A, TZ) = A for all A G Nq 

tr(pA, TZ) = ->A for all A £ N c 

tr(d nC 2 ,TZ) = tr{d,TZ)ntr{C 2 ,TZ) 

tr(CiUC 2j ft) = tr(Ci,7e) Utr(C 2 ,7e) 

tr(^n(rin...r\r n ).D,TZ) = (txi n t(n n . . . n r n ,TZ).tr(D,TZ)) 

tr(V(rin...nr n ).D,7e) = X ri n...nr n ,D 

tr(3(nn...ng.D,K) = ^(x rin ... nrn ^ D ) 

where IX stands for < or ^. Set tc((n n . . . n r n ), TZ) := {(ti n . . . n t n ) | U ^-RTi and U G 
Trans^ for each i such that 1 < i < n} and define an extended TBox Tc,ti as 

Tc,n={X ri n...nr n ,D = V t(n n ... n r n ,TZ).tr(D,TZ)\ V(n n ... n r n ).D G cl(C,^)} U 

{Ir 1 n...nr»,oEVt(T,K).lT,fl | T G tc(n n . . . n r n , ^)} A 

Lemma 27. Xei C be a STiIQ n -concept in NNF, TZ a role hierarchy, and tr and Tc,iz 
as defined in Definition 26. The concept C is satisfiable w.r.t. TZ iff the ACCQIb- concept 
U(C,TZ) is satisfiable w.r.t. Tc,n- 
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Given Lemma 25, the proof of Lemma 27 is a long, but straightforward extension of the 
proof given by Tobies (2001, Lemma 6.23). 

We now analyse the complexity of the above described problem. Let m := \1Z\ and 
T\ n . . . n r n the longest role conjunction occurring in C, i.e., the maximal number of roles 
that occur in a role conjunction in C is n. The TBox Tc,K can contain exponentially 
many axioms in n since the cardinality of the set tc((ri n ... n r n ),7Z) for the longest role 
conjunction can only be bounded by m n because each can have more than one transitive 
sub-role. It is not hard to check that the size of each axiom is polynomial in \C\. Since 
deciding whether an ACCQZb concept C is satisfiable w.r.t. an ACCQZb TBox T is an 
ExpTiME-complete problem (even with binary coding of numbers) (Tobies, 2001, Thm. 
4.42), the satisfiability of a 5%Z<2 n -concept C can be checked in time 2 p ( m ) 2P(n) . 

We now extend the translation from concepts to knowledge bases. Tobies assumes that 
all role assertions in the ABox are of the form r(a, b) with r a role name or the inverse of a 
role name. Extended ABoxes contain, however, also negated roles in role assertions, which 
require a different translation. A positive role assertion such as r(a,b) is translated in the 
standard way by closing the role upwards. The only difference of using f directly is that we 
additionally split the conjunction (f (r, lZ))(a, b) = (r\ n . . . n r n )(a,b) into n different role 
assertions r±(a,b), . . . , r n (a, b), which is clearly justified by the semantics. For negated roles 
in a role assertion such as ~ir(a, b), we close the role downwards instead of upwards and add 
a role atom -is(a,6) for each sub-role s of r. This is again justified by the semantics. Let 
K. = (T U T q ,lZ, AU A g ) be an extended knowledge base. More precisely, we set 



and we use tr(/C, 1Z) to denote the ACCQZb knowledge base (tr(T LlT q ,TZ), tr(A U A q , 1Z)). 

For the complexity of deciding the consistency of a translated SUZQ 1 knowledge base, 
we can apply the same arguments as above for concept satisfiability, which gives the follow- 
ing result: 

Lemma 28. Given a SHZQ n knowledge base K. = (T,TZ,A) where m := |/C| and the size 
of the longest role conjunction is n, we can decide consistency of /C in deterministic time 
2P(m)2 p ( n ) with p a polynomial. 

We are now ready to show that the algorithm given in Definition 22 runs in deterministic 
time single exponential in the size of the input KB and double exponential in the size of 
the input query. 

Lemma 29. Let K, = (T,TZ,A) be a ST-LIQ knowledge base with m = |/C| and q a union 
of connected Boolean conjunctive queries with n = \q\. Given K. and q as input, the algo- 
rithm given in Definition 22 decides whether K, \= q under the unique name assumption in 
deterministic time in 2P( m ) 2P(n> . 



tr{TuT q ,K) 



{tr(C, U) C tr{D, K)\C QD G T U TJ, 



tr(AuA q ,TZ) 



{(tr(C, K))(a) | C{a) £ A U A q } U 
{s(a, b) | r(a, b) £ A U A q and r &fts} U 
{-■s(a, b) | -r(a, b) £ A U A q and s El^r}, 
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In the proof of the above lemma, we show that there is some polynomial p such that we 
have to check at most 2 p ( m ) 2P(n) extended knowledge bases for consistency and that each 
consistency check can be done in this time bound as well. 

More precisely, let q = q\V. . .V<#, T = trees^c(gi)U. . .LStreesjc(qe) , and G = ground^(gi)U 
. . . U ground^(^). Together with Lemma 20, we get that (J(T) and are bounded by 
2P( n ) lo §p( m ) for some polynomial p and that the size of each query in G and T is polynomial 
m n. Each of the 2P( n H°sP( m ) ground queries in G contributes at most p(n) negated assertion 
to an extended ABox A q . Hence, there are at most 

2 p(m)2^) ex tended ABoxes A q and, 
therefore, 2 p ( m ) 2P(n) extended knowledge bases that have to be tested for consistency. 

Given the bounds on the cardinalities of T and G and the fact that the size of each 
query in T and G is polynomial in n, it is not hard to check that the size of each extended 
knowledge base K q = (TuT„7l,iU A q ) is bounded by 2 p ( n > l ° sp ( m ) and that each K q 
can be computed in this time bound as well. Since only the extended parts contain role 
conjunctions and the number of roles in a role conjunction is polynomial in n, there is a 
polynomial p such that 

1. \tr(T,n)\ <p(m), 

2. \tr(T q ,K)\ < 2P( n H°gp( m ), 

3. \tr(A,K)\ <p(m), 

4. \tr(Aq,K)\ < 2P( n )- lo sP( m ), and, hence, 

5. \tr(Kq,K)\ < 2P( n )- lo sf( m ). 

By Lemma 28, each consistency check can be done in time 2 p ( m ) 2P(n) for some polynomial 
p. Since we have to check at most 2 p ( m ) 2P(n) extended knowledge bases for consistency, and 
each check can be done in time 2 p ( m ) 2Pl " > , we obtain the desired upper bound. 

We now show that this result carries over even when we do not restrict interpretations 
to the unique name assumption. 

Definition 30. Let K, = (T, 1Z, A) be a SHXQ knowledge base and q a SHIQ union 
of Boolean conjunctive queries. For a partition V of Inds(^l), a knowledge base K? = 
(T,H,A' P ) and a query q v are called an A-partition w.r.t. K and q if A^ and q v are 
obtained from A and q as follows: 
For each P G V 

1. Choose one individual name a 6 P. 

2. For each b 6 P, replace each occurrence of b in A and q with a. 

A 

Please note that w.l.o.g. we assume that all constants that occur in the query occur 
in the ABox as well and that thus a partition of the individual names in the ABox also 
partitions the query. 

Lemma 31. Let K = (T,1Z,A) be a SHXQ knowledge base and q a union of Boolean 
conjunctive queries. fC q without making the unique name assumption iff there is an 
A-partition KT = (T, / R,A V ) and q v w.r.t. K, and q such that \£ q v under the unique 
name assumption. 
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Let /C = (T, TZ, A) be a knowledge base in a Description Logic DC, C be the complexity 
class such that deciding whether K, \= q under the unique name assumption is in C, and let 
n = 2'" 4 l Since the number of partitions for an ABox is at most exponential in the number 
of individual names that occur in the ABox, the following is a straightforward consequence 
of the above lemma: for a Boolean conjunctive DC query q, deciding whether 1C \= q without 
making the unique name assumption can be reduced to deciding n times a problem in C. 

In order to extend our algorithm to unions of possibly unconnected Boolean conjunctive 
queries, we first transform the input query q into conjunctive normal form (CNF). We 
then check entailment for each conjunct q^, which is now a union of connected Boolean 
conjunctive queries. The algorithm returns "/C entails q" if each entailment check succeeds 
and it answers "JC does not entail q" otherwise. By Lemma 5 and Lemma 23, the algorithm 
is correct. 

Let KL be a knowledge base in a Description Logic DC, q a union of connected Boolean 
conjunctive DC queries, and C the complexity class such that deciding whether K, \= q is in 
C. Let q' be a union of possibly unconnected Boolean conjunctive queries and cnf(g') the 
CNF of q'. Since the number of conjuncts in cnf(</) is at most exponential in \q'\, deciding 
whether K\= q' can be reduced to deciding n times a problem in C, with n = 2 p ^ q D and p 
a polynomial. 

The above observation together with the results from Lemma 29 gives the following 
general result: 

Theorem 32. Let 1C = (T,1Z,A) be a STiZQ knowledge base with m = |/C| and q a union 
of Boolean conjunctive queries with n = \q\. Deciding whether K, \= q can be done in 
deterministic time in 2P( m ) 2P(n) . 

A corresponding lower bound follows from the work by Lutz (2007). Hence the above 
result is tight. The result improves the known co-3NExpTime upper bound for the setting 
where the roles in the query are restricted to simple ones (Ortiz, Calvanese, &: Eiter, 2006a). 

Corollary 33. Let K, be a SHIQ knowledge base with m = |/C| and q a union of Boolean 
conjunctive queries with n = \q\. Deciding whether K, \= q is a ^ExpTime- complete problem. 

Regarding query answering, we refer back to the end of Section 2.2, where we explain 
that deciding which tuples belong to the set of answers can be checked with at most 
entailment tests, where k is the number of answer variables in the query and mj^ is the 
number of individual names in I nds(^4) . Hence, at least theoretically, this is absorbed by 
the combined complexity of query entailment in SHIQ. 

6.2 A Non-Deterministic Decision Procedure for Query Entailment in SHXQ 

In order to study the data complexity of query entailment, we devise a non-deterministic 
decision procedure which provides a tight bound for the complexity of the problem. Actually, 
the devised algorithm decides non-entailment of queries: we guess an extended knowledge 
base tCq, check whether it is consistent, and return '7C does not entail q v if the check succeeds 
and "/C entails q" otherwise. 

Definition 34. Let T be a SHIQ TBox, TZ a SHXQ role hierarchy, and q a union of 
Boolean conjunctive queries. Given a ST-LXQ ABox A as input, the algorithm guesses an 
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^-partition K? = (T, TZ, A v ) and q v w.r.t. JC = (T, 1Z, A) and q. The query q v is then 
transformed into CNF and one of the resulting conjuncts, say qf, is chosen. The algorithm 
then guesses an extended knowledge base /C^ = (T U T qi ,TZ,A^ U A^J w.r.t. K? and 
and returns 11 JC does not entail o" if is consistent and it returns "/C entails o" otherwise. 

A 

Compared to the deterministic version of the algorithm given in Definition 22, we do not 
make the UNA but guess a partition of the individual names. We also non-deterministically 
choose one of the conjuncts that result from the transformation into CNF. For this conjunct, 
we guess an extended ABox and check whether the extended knowledge base for the guessed 
ABox is consistent and, therefore, a counter-model for the query entailment. 

In its (equivalent) negated form, Lemma 23 says that JC ^= q iff there is an extended 
knowledge base JC q w.r.t. JC and q such that JC q is consistent. Together with Lemma 31 it 
follows, therefore, that the algorithm from Definition 34 is correct. 

6.2.1 Data Complexity of Query Entailment in SHXQ 

We now analyze the data complexity of the algorithm given in Definition 34 and show that 
deciding UCQ entailment in SHXQ is indeed in co-NP for data complexity. 

Theorem 35. Let T be a SHXQ TBox, 1Z a SHXQ role hierarchy, and q a union of 
Boolean conjunctive queries. Given a SHXQ ABox A with m a = \A\, the algorithm from 
Definition 34 decides in non- deterministic polynomial time in m a whether JC ^= q for JC = 
(T, K, A). 

Clearly, the size of an ABox A^ in an .A-partition is bounded by m a . Since the query 
is no longer an input, its size is constant and the transformation to CNF can be done in 
constant time. We then non-deterministically choose one of the resulting conjuncts. Let 
this conjunct be qi = <7(i,i) V ... V q^e)- As established in Lemma 32, the maximal size of an 
extended ABox A^. is polynomial in m a . Hence, \A V U A^.\ < p(m a ) for some polynomial 
p. Due to Lemma 20 and since the size of q, T, and 1Z is fixed by assumption, the sets 
trees^p (qaj)) and ground^ (q(i,j)) for each j such that 1 < j < £ can be computed in time 
polynomial in m a . From Lemma 29, we know that the translation of an extended knowledge 
base into an ACCQXb knowledge base is polynomial in m a and a close inspection of the 
algorithm by Tobies (2001) for deciding consistency of an ACCQXb knowledge base shows 
that its runtime is also polynomial in m a . 

The bound given in Theorem 35 is tight since the data complexity of conjunctive query 
entailment is already co-NP-hard for the ACS fragment of SHXQ (Schaerf, 1993). 

Corollary 36. Conjunctive query entailment in SHXQ is data complete for co-NP. 

Due to the correspondence between query containment and query answering (Calvanese 
et al., 1998a), the algorithm can also be used to decide containment of two unions of 
conjunctive queries over a SHXQ knowledge base, which gives the following result: 

Corollary 37. Given a SHXQ knowledge base K, and two unions of conjunctive queries q 
and q' , the problem whether JC \= q C q' is decidable. 
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By using the result of Rosati (2006a, Thm. 11), we further show that the consistency of 
a STiZQ knowledge base extended with (weakly-safe) Datalog rules is decidable. 

Corollary 38. The consistency of SUXQ+log-KBs (both under FOL semantics and under 
NM semantics) is decidable. 

7. Conclusions 

With the decision procedure presented for entailment of unions of conjunctive queries in 
STilQ, we close a long standing open problem. The solution has immediate consequences 
on related areas, as it shows that several other open problems such as query answering, 
query containment and the extension of a knowledge base with weakly safe Datalog rules 
for SUZQ are decidable as well. Regarding combined complexity, we present a deterministic 
algorithm that needs time single exponential in the size of the KB and double exponential 
in the size of the query, which gives a tight upper bound for the problem. This result 
shows that deciding conjunctive query entailment is strictly harder than instance checking 
for SHIQ. We further prove co-NP-completeness for data complexity. Interestingly, this 
shows that regarding data complexity deciding UCQ entailment is (at least theoretically) 
not harder than instance checking for SHZQ, which was also a previously open question. 

It will be part of our future work to extend this procedure to SHOIQ, which is the 
DL underlying OWL DL. We will also attempt to find more implementable algorithms for 
query answering in SH.TQ. Carrying out the query rewriting steps in a more goal directed 
way will be crucial to achieving this. 
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Lemma (7). Let JC be a STiZQ knowledge base and q = q\ V . . . V q n a union of conjunctive 
queries, then /C Y= q iff there exists a canonical model X of IC such that X \/= q. 

Proof of Lemma 7. The "if" direction is trivial. 

For the "only if" direction, since an inconsistent knowledge base entails every query, we 
can assume that KL is consistent. Hence, there is an interpretation X 1 = (A x , ■ ) such that 
I' \= K and X' \/= q. From X', we construct a canonical model I for /C and its forest base J 
as follows: we define the set P C (A x )* of paths to be the smallest set such that 

• for all a G \nds(A),a x is a path; 

• d\ ■ ■ ■ d n ■ d is a path, if 

— d\ ■ ■ ■ d n is a path, 

— (d n ,d) € r x for some role r, 

— if there is an a G lnds(^4) such that d = a x , then n > 2. 

For a path p = d\ ■ ■ ■ d n , the length len(p) of p is n. Now fix a set S C Inds(^l) x IN* and a 
bijection / : S — > P such that 

(i) lnds(^4) x {e} C S, 

(ii) for each a G Inds(^l), {w \ (a,w) G S} is a tree, 
(hi) /((a,e)) = a x ', 

(iv) if (a,w), (a,w') G 5 with w' a successor of w, then f((a,w')) = f((a,w)) ■ d for some 
d G A 1 '. 

For all (a,u>) G S, set Tail((a,tf)) := d n if f((a,w)) = d\ ■ ■ ■ d n . Now, dehne a forest base 
J = (A J ,- J ) for /C as follows: 

(a) A J := 5; 

(b) for each a G lnds(^4), a J := (a, e) G S; 

(c) for each 6 G Nj\ lnds(^4), = for some fixed a G Inds(^l); 

(d) for each C G N c , (a,w) G C J if (a,u>) G 5 and Tail((a,w)) G C 1 '; 

(e) For all roles r, ((a, w), (b,w')) G r" 7 if either 

(I) w = w' = e and (aF\ b 1 ') G r 1 ' or 

(II) o = b, w' is a neighbor of w and (Tail((a, w)), Tail ((6, w'))) G r 1 '. 
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It is clear that J is a forest base for /C due to the definition of S and the construction 
of J from S. 

Let X = (A x ,- X ) be an interpretation that is identical to J except that, for all non-simple 
roles r, we set 

r x = r J U [J (s J ) + 

s 5 R r, sgTrans7j 

It is tedious but not too hard to verify that X \= IC and that J is a forest base for X. Hence 
X is a canonical model for /C. 

Therefore, we only have to show that X \/= q. Assume to the contrary that X |= q. 
Then there is some tt and i with 1 < i < n such that X \= n q^. We now define a mapping 
7r': Terms((/j) — s> A 2 by setting 7r'(i) := Tail(vr(i)) for all t € Terms(gj). It is not difficult to 
check that X' \= n qi and hence X' \= n q, which is a contradiction. □ 

Lemma (15). Let I be a model for JC. 

1. If I \= q, then there is a collapsing q co of q such that X \= nco q co for ir co an injection 
modulo !=S. 

2. If X \= Wco q co for a collapsing q co of q, then X \= q. 

Proof of Lemma 15. For (1), let ir be such that X \= w q, let q co be the collapsing of q that 
is obtained by adding an atom t ~ t' for all terms t, t' 6 Terms^) for which ir(t) = vr(t'). 
By definition of the semantics, X \= n q co and tt is an injection modulo A. 

Condition (2) trivially holds since q C q co and hence X \= 7rco q. □ 

Lemma (16). LetX be a model for IC. 

1. If X is canonical and X \= n q, then there is a pair (q sr ,R) 6 sr/c(g) and a split match 
ir sr such that X \= nsr q sr; R is the induced root splitting of 7r sr , and ir sr is an injection 
modulo ~. 

2. If (q sr , R) € sr/c(g) and I q sr for some match ir sr , then I \= q. 

Proof of Lemma 16. The proof of the second claim is relatively straightforward: since 
(q sr ,R) G sr/c(q), there is a collapsing q co of q such that q sr is a split rewriting of q C o- 
Since all roles replaced in a split rewriting are non-simple and X \= q sr by assumption, we 
have that X \= q co . By Lemma 15 (2), we then have that X \= q as required. 

We go through the proof of the first claim in more detail: let q co be in co(q) such 
that I \= 7Tco q co for a match ir co that is injective modulo ~. Such a collapsing q co and 
match ir co exist due to Lemma 15. If tt co is a split match w.r.t. q and X already, we are 
done, since a split match induces a root splitting R and (q co ,R) is trivially in srjc(q). If 
ir co is not a split match, there are at least two terms t,t' with r(t,t') E q co such that 
7Tco(i) = (a, w), vr co (t') = (a',w'),a ^ a', and w ^ e or w' ^ e. We distinguish two cases: 
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1. Both t and t 1 are not mapped to roots, i.e., w ^ e and w' ^ e. Since X \= Wco r(t,t'), 
we have that {ir co {t), vr co (t')) G r x . Since X is a canonical model for /C, there must be 
a role s with s E-ftr and s € Trans^ such that 

{(TT co {t),(a,e)), {{a,s),{a',£)), ({a , e),Tr co (t'))} <Z s 1 . 

If there is some t G Terms(g , co ) such that ir co (t) = (a, e), then let u = t, otherwise let u 
be a fresh variable. Similarly, if there is some t' G Terms(g co ) such that ir co (t') = (a', e), 
then let u' = i', otherwise let v! be a fresh variable. Hence, we can define a split 
rewriting q sr of q co by replacing r(t,t') with s(t,u), s(u,u'), and s(u',t'). We then 
define a new mapping 7r sr that agrees with 7r co on all terms that occur in q co and that 
maps u to (a,e) and u' to (a',e). 

2. Either i or t' is mapped to a root. W.l.o.g., let this be t, i.e., 7r(i) = (a, e). We can use 
the same arguments as above: since X |= 7r «' r(t, i'), we have that (vr(t), 7r(i')) G and, 
since Z is a canonical model for AT, there must be a role s with s E^r and s G Trans^ 
such that {(vr(t), (a', e)), ((a', e), 7r(i'))} C s x . If there is some t G Terms(g co ) such 
that 7r co (i) = (a', e), then let u = t, otherwise let u be a fresh variable. We then define 
a split rewriting q sr of g co by replacing r(t,t') with s(t, it), s(u, t')&nd a mapping 7r sr 
that agrees with tt co on all terms that occur in q co and that maps u to (a',e). 

It immediately follows that X \= nsr q sr . We can proceed as described above for each role 
atom r(t,t') for which ir(t) = (a,w) and vr(t') = (a',w') with a ^ a' and w / e or w' ^ 
e. This will result in a split rewriting q sr and a split match 7r sr such that X ^ 71 ~ sr q sr . 
Furthermore, ir sr is injective modulo A since we only introduce new variables, when the 
variable is mapped to an element that is not yet in the range of the match. Since ir sr is a 
split match, it induces a root splitting R and, hence, (q sr ,R) G sr^(g) as required. □ 

Lemma (17). Let X be a model of JC. 

1. IfX is canonical and I \= q, then there is a pair (qe r , R) £ ^K.{q) an d a mapping iri r 
such that X ^""fr q ir; Tr ir is an injection modulo R is the root splitting induced by 
■K£ r and, for each r(t,t) G qi r , t G R. 

2. If (qe r , R) G \r/c(q) and X ^ 7r * r q% r for some match tt^., then 1 \= q. 

Proof of Lemma 17. The proof of (2) is analogous to the one given in Lemma 16 since, by 
definition of loop rewritings, all roles replaced in a loop rewriting are again non-simple. 

For (1), let (q sr ,R) G sr^(g) be such that X \= nsr q srj ir sr is a split match, and R is 
the root splitting induced by Tr sr . Such a split rewriting q sr and match n sr exist due to 
Lemma 16 and the canonicity of X. 

Let r(i, t) G q sr for t £ R. Since R is the root splitting induced by ir sr and since 
t ^ R, TT sr (t) = (a,w) for some a G lnds(*4) and w ^ e. Now, let J be a forest base for 
X. We show that there exists a neighbor d of vr sr (t) and a role s G Trans^ such that s E^r 
and (7T sr (t),d) G s x fl lnv(s) X . Since X \=^ sr g sr , we have (Tr sr (t),ir sr (t)) G r x . Since J' 
is a forest base and since w ^ e, we have (-7r sr (t), 7r sr (t)) ^ r^. It follows that there is a 
sequence d\,. . . ,d n G A 1 and a role s G Trans^ such that s E^r, d\ = vr sr (t) = d n , and 
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(di,di+i) G for 1 < i < n and di ^ d\ for each i with 1 < i < n. Then it is not hard 
to see that, because {w' | (a,w') G A 1 } is a tree and w ^ e, we have c?2 = dn-i- Since 
(di,^) £ s" 7 ' an d (d n -i,d n ) G with d n _i = ^2 and d n = d%, the role s and the element 
d = c?2 is as required. For each r(t,t) G g sr with t £ R, select an element ci rj j and a role 
s r j as described above. Now let qg r be obtained from q sr by doing the following for each 
r(t, t) G (/ sr with t ^ R: 

t — 7r sr .(t') for some i' G Terms((7 sr ), then replace r(t, t) with s r j(t, i') and s rj t(t f , t); 

• otherwise, introduce a new variable u r ,j G iVy and replace r(t,t) with s r> t(t,v r> t) and 
s r ,t(vr tt ,t). 

Let 7T£ r be obtained from 7r sr by extending it with 7r^ r (u r t) = (i ri t for each newly introduced 
variable v r ,t. By definition of % r and 7Tfr, % r is connected, ir£ r is injective modulo A, and 

X K fr <Ur' □ 

Lemma (18). Let Z be a model of K,. 

1. If I is canonical and X\= q, then there is a pair (qj r , R) G ^K,{q) such that X \= w f r qt r 
for a forest match TTf r , R is the induced root splitting of 7Vf r , and 7Tf r is an injection 
modulo fS. 

2. If (qf r , R) G fric(q) and 1 \= n f r qj r for some match TTf r , then I \= q. 



Proof of Lemma 18. The proof of (2) is again analogous to the one given in Lemma 16. For 
(1), let {qi r , R) G lnt(g) be such that X |= 7r£r q^ r , R is the root splitting induced by TT£ r , -K£ r 
is injective modulo & and, for each r(t, t) G qe r , t G R. Such a loop rewriting and match ni r 
exist due to Lemma 17 and the canonicity of I. By definition, R is a root splitting w.r.t. 
qir and K,. 

For w,w' G N*, the longest common prefix (LCP) of w, w' is the longest w* G N* such 
that w* is prefix of both w and w'. For the match 7T£ r we now define the set D as follows: 

D := ran(7Tfr) U{(a,w) G A x | w is the LCP of some w,w' 

with (a,w'),(a,w") G ran(7r^ r )}. 

Let V C Ny \ Vars(q , £ r ) be such that, for each d G D\ ran(7Tfr), there is a unique G V. 
We now define a mapping -Kf r as tt&. U {t^ 6 7 4 <i}. By definition of V and t^, 7r/ r is a 
split match as well. The set V U Vars(<# r ) will be the set of variables for the new query qf r . 
Note that ran(-7ry r ) = D. 

Fact (a) if (a, w), (a, w') G ran(7r/ r ), then (a,w") G ran(7r/ r ), where w" is the LCP of w 
and w/; 

Fact (b) (t(V) < j}(Vars(<# r )) (Because, in the worst case, all (a,w) in ran(-7T£ r ) are "incom- 
parable" and can thus be seen as leaves of a binarily branching tree. Now, a tree that 
has n leaves and is at least binarily branching at every non-leaf has at most n inner 
nodes, and thus jJ(V) < jJ(Vars(<# r )). 
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For a pair of individuals d, d! G A x , the path from d to d! is the (unique) shortest sequence 
of elements di,...,d n G A x such that d\ = d, d n = d! , and di + \ is a neighbor of di for all 
1 < i < n. The length of a path is the number of elements in it, i.e., the path d±, . . . , d n is 
of length n. The relevant path d^, . . . , d' e from d to d' is the sub-sequence of d±, ■ ■ ■ , d n that 
is obtained by dropping all elements d{ ^ D. 

Claim 1. Let r(t, t') G subq((?£ r , i r ) for some t r € R and let d\, . . . , d' e be the relevant path 
from d = d'-y = ir£ r (t) to d' = d'« = 7T£ r (t'). If i > 2, there is a role s G Trans-^. such that 
s S=rT and (d-, d' i+1 ) G s 1 for all 1 < i < t. 

Proof. Let di, . . . , d n be the path and d[, . . . , d'« the relevant path from TT£ r (t) to Tre r (t'). 
Then I > 2 implies n > 2. We have to show that there is a role s as in the claim. Let J 
be a forest base for X. Since X \= nir q^ T) n > 2 implies (7Tfc.(t), 7Tfr (i')) £ r 1 \ r^ . Since X is 
based on J , it follows that there is an s € Trans^ such that s E-^r, and (di,di+i) G for 
all 1 < i < n. By construction of X from J , it follows that (d' i} d' i+l ) G s" 1 for all 1 < i < I, 
which finishes the proof of the claim. 

Now let qf r be obtained from qi r as follows: for each role atom r(t, t) G subq(gfr, t r ) with 
t r G R, if the length of the relevant path d±, . . . , d'p from d = d' 1 = ir£ r (t) to d! = di = iri r (t') 
is greater than 2, then select a role s and variables tj G D such that ^f r (tj) = dj as in 
Claim 1 and replace the atom r(t,t') with s(ti, t^), ■ ■ ■ , s(ti-i, ti), where t = t±, t' = tt. 
Please note that these tj can be chosen in a "don't care" non-deterministic way since TTf r is 
injective modulo ~, i.e., if irf r (tj) = dj = irf r (tj), then tj'&t'- and we can pick any of these. 

We now have to show that 

(i) X K /r ?/r, and 

(ii) iTf r is a forest match. 

For (i), let r(t,t') G qi r \ qfr and let s(t\, tz), . . . , s(ti-i, tg) be the atoms that replaced 
r(t,t f ). Since X \= nir q£ r , X \= ner r(t,t') and (7r^ r (£), vr_g r (*')) G r 1 . Since r(t,t') was replaced 
in qf r , the length of the relevant path from ir£ r (t) to 7T£ r (t') is greater than 2. Hence, it must 
be the case that (7r£ r (t),TT£ r (t')) G r x \r^ . Let di,...,d n with d\ = 7T£ r (t) and d n = n£ r (t') 
be the path from 7T£ r (t) to 7T£ r (t') and d[, . . . , d' e the relevant path from TT£ r (t) to TT£ r (t'). By 
construction of I from ^7, this means that there is a role s G Trans^ such that s El^r and 
(ci^dj+i) G s 1 ^ for all 1 < i < n. Again by construction of X, this means {d[, d' i+l ) G s x for 
1 < i < I as required. Hence X \=™f r s(ti, ti+i) for each i with < i < I by definition of TTf r . 

For (ii): the mapping TTf r differs from TT£ r only for the newly introduced variables. 
Furthermore, we only introduced new role atoms within a sub-query subq((?£ r , t r ) and 7r^ r 
is a split match by assumption. Hence, TTf r is trivially a split match and we only have to 
show that 7rf r is a forest match. Since TTf r is a split match, we can do this "tree by tree". 

For each a G lnds(^4), let T a := {w \ (a,w) G ran(7rj r )}. We need to construct a mapping 
/ as specified in Definition 14, and we start with its root t r . If T a ^ 0, let t r G Terms(g) 
be the unique term such that irf r (t r ) = (a,w r ) and there is no t G Terms(g) such that 
7Tf r (t) = (a, w) and w; is a proper prefix of w r . Such a term exists since 7Tf r is a split match 
and it is unique due to Fact (a) above. Define a trace to be a sequence w = w\ ■ ■ ■ w n G T+ 
such that 

• w\ = w r ; 
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• for all 1 < i < n, wi is the longest proper prefix of lOi+i- 

Since X is canonical, each W{ G T a is in INT. It is not hard to see that T = {w \ w is a trace} U 
{e} is a tree. For a trace w = w± ■ ■ ■ w n , let Tail(tD) = w n . Define a mapping / that maps 
each term t with 7rj r (i) = (a,w) G T a to the unique trace u>t such that u; = Tail('u)t). Let 
r(t,t') G g/ r such that iTf r (t),irf r (t') G T a . By construction of g/ r , this implies that the 
length of the relevant path from irf r (t) to irf r (t') is exactly 2. Thus, f(t) and /(t') are 
neighbors in T and, hence, 7rj. r is a forest match as required. □ 

Theorem (19). Let IC be a SHXQ knowledge base, q a Boolean conjunctive query, and 
{qi,.. .,qe} = trees/c(g) U ground^). Then K, \= q iff JC \= q x V . . . V qg. 

Proof of Theorem 19. For the "if" direction: let us assume that fC \= q\ V . . . V qg. Hence, 
for each model X of IC, there is a query qi with 1 < i < £ such that T\= q^. We distinguish 
two cases: (i) qi G trees^g) and (ii) qi G ground^ (g). 

For (i): qi is of the form C{v) where C is the query concept for some query qj r w.r.t. 
v € Vars(g/ r ) and (g/ r , 0) G frjc(g). Hence X ^ g^ for some match ir, and thus 1 \= w C(v). 
Let d G A 1 with d = 7r(v) G C 2 '. By Lemma 12, we then have that X \= qj r and, by 
Lemma 18, we then have that X \= q as required. 

For (ii): since g» G groundy C (g), there is some pair (qf r ,R) G fr^(g) such that qi = 
ground(gj r , R, r). We show that X \= w f r qj r for some match irf r . Since Z |= gi, there 
is a match 7Tj such that X |= 7r< gj. We now construct the match 7r/y. For each t G R, 
qi contains a concept atom C(r(t)) where C = con(subq(gy>, t), t) is the query concept of 
subq(gj r ,t) w.r.t. t. Since I \= ni C{r{t)) and by Lemma 12, there is a match nt such that 
I \= Wt subq(g/y, t). We now define 7ry r as the union of lit, for each t G ii. Please note that 
fffrit) = 7r i( r (*))- Since lnds(g/ r ) C i? and r is such that, for each a G lnds(g/ r ), r(a) = a 
and r(t) = r(t') iff it follows that X \= w f r at for each atom at G g/ r such that at 
contains only terms from the root choice R and hence X \= n f r qj r as required. 

For the "only if" direction we have to show that, if K, \= g, then 1C \= q\ V . . . V qi, so let 
us assume that IC \= q. By Lemma 7 in its negated form we have that /C |= q iff all canonical 
models X of /C are such that X |= g. Hence, we can restrict our attention to the canonical 
models of IC. By Lemma 18, X \= fC and X \= q implies that there is a pair (qf r ,R) G fr/c(g) 
such that I \= n f r qj r for a forest match 7T/>, i? is the induced root splitting of 7Tf r , and 7T/ r 
is an injection modulo A. We again distinguish two cases: 

(i) R = 0, i.e., the root splitting is empty and 7Tf r is a tree match, and 

(ii) R 0, i.e., the root splitting is non-empty and 7rj r is a forest match but not a tree 

match. 

For (i): since (g/ r ,0) G fr/c(g), there is some v G Terms(gj r ) such that C = con(qf r ,v) and 
gi = C{v). By Lemma 12 and, since X \= qj r , there is an element d G A x such that <i G C x . 
Hence X \= w C(v) with tt : v t-> d as required. 

For (ii): since i? is the root splitting induced by 7Tf r , for each t £ R there is some 
at G lnds(^4) such that 7Tf r (t) = (at,e). We now define the mapping r: R — > lnds(^4) as 
follows: for each t G R, r(t) = at iff vr/- r (t) = (at,e). By definition of ground(gj r , R, r), 
gi = ground(gy r , i?, r) G ground /c (g). Since X \= n fr gj r , X |= subq(gj r ,t) for each t G R. 
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Since qf r is forest-shaped, each subq(g , j r , t) is tree-shaped. Then, by Lemma 12, 1 \= q' i7 
where q[ is the query obtained from qj r by replacing each sub-query subq(g/ r ,t) with C(t) 
for C the query concept of subq(qf r ,t) w.r.t. t. By definition of r from the forest match 
TTf r , it is clear that I \= ground(gj r , R, r) as required. □ 

Lemma (20). Let q be a Boolean conjunctive query, JC = (T,TZ,A) a STiXQ knowledge 
base, \q\ := n and \K\ := m. Then there is a polynomial p such that 

1. tt(co(g)) < 2 p(n) and, for each q' G co(q), \q'\ < p(n), 

2. |t(srx:(g)) < 2P(")- lo SP( m ), and, for each q' G sr K (q), \q'\ < p(n), 

3. tJ('r)c(g)) < 2P( n )- lo sP( m ), and, for each q' G \r K (q), \q'\ < p(n), 
4- K fr K(q)) < 2P( n )' lo SP( m ), and, for each q' G frjc(g), \q'\ < p(n), 

5. # (trees*; (g)) < 2^ n > Xo z p ( m \ and, for each q' G trees K (q), \q'\ < p(n), and 

6. S(ground c (g)) < 2 p ^ lo ^ m \ and, for each q' G ground^^), \q'\ < p(n). 

Proof of Lemma 20. 

1. The set co{q) contains those queries obtained from q by adding at most n equality 
atoms to q. The number of collapsings corresponds, therefore, to building all equiv- 
alence classes over the terms in q by ~. Hence, the cardinality of the set co{q) is at 
most exponential in n. Since we add at most one equality atom for each pair of terms, 
the size of a query q' G co{q) is at most n + n 2 , and \q'\ is, therefore, polynomial in n. 

2. For each of the at most n role atoms, we can choose to do nothing, replace the 
atom with two atoms, or with three atoms. For every replacement, we can choose to 
introduce a new variable or re-use one of the existing variables. If we introduce a new 
variable every time, the new query contains at most 3n terms. Since fC can contain at 
most m non-simple roles that are a sub-role of a role used in role atoms of q, we have 
at most m roles to choose from when replacing a role atom. Overall, this gives us at 
most 1 + m(3n) + m(3n)(3n) choices for each of the at most n role atoms in a query 
and, therefore, the number of split rewritings for each query q' G co(^) is polynomial 
in m and exponential in n. In combination with the results from (1), this also shows 
that the overall number of split rewritings is polynomial in m and exponential in n. 

Since we add at most two new role atoms for each of the existing role atoms, the size 
of a query q' G srjc{q) is linear in n. 

3. There are at most n role atoms of the form r(t,t) in a query q' G sr^(q) that could 
give rise to a loop rewriting, at most m non-simple sub-roles of r in K, that can be 
used in the loop rewriting, and we can introduce at most one new variable for each 
role atom r(t,t). Therefore, for each query in srjc(q), the number of loop rewritings 
is again polynomial in m and exponential in n. Combined with the results from (2), 
this bound also holds for the cardinality of \rjc(q). 

In a loop rewriting, one role atom is replaced with two role atoms, hence, the size of 
a query q' G lr*:(g) at most doubles. 
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4. We can use similar arguments as above in order to derive a bound that is exponential 
in n and polynomial in m for the number of forest rewritings in fr^(g). 

Since the number of role atoms that we can introduce in a forest rewriting is polyno- 
mial in n, the size of each query q' G fr/c(<?) is at most quadratic in n. 

5. The cardinality of the set treesic(q) is clearly also polynomial in m and exponential in 
n since each query in fr^(<?) can contribute at most one query to the set trees^(g). It 
is not hard to see that the size of a query q' G trees/c(q) is polynomial in n. 

6. By (l)-(4) above, the number of terms in a root splitting is polynomial in n and there 
are at most m individual names occurring in A that can be used for the mapping r 
from terms to individual names. Hence the number of different ground mappings r is 
at most polynomial in m and exponential in n. The number of ground queries that a 
single tuple (gyy, R) G ^k.(q) can contribute is, therefore, also at most polynomial in m 
and exponential in n. Together with the bound on the number of forest rewritings from 
(4), this shows that the cardinality of ground^q) is polynomial in m and exponential 
in n. Again it is not hard to see that the size of each query q' G ground^ (q) is 
polynomial in n. 

□ 

Lemma (23). Let K be a ST-LXQ knowledge base and q a union of connected Boolean 
conjunctive queries. The algorithm from Definition 22 answers "K entails q" iff K, \= q 
under the unique name assumption. 

Proof of Lemma 23. For the "only if" -direction: let q = q± V . . . V qi. We show the contra- 
positive and assume that fC ^= q. We can assume that /C is consistent since an inconsistent 
knowledge base trivially entails every query. Let X be a model of /C such that X y= q. We 
show that X is also a model of some extended knowledge base fC q = (T U T q , 1Z, A U A q ). 
We first show that X is a model of T q . To this end, let T C —>C in T q . Then C(v) € T 
and C = con(gy r , v) for some pair (qf r , 0) G frjc((Zi) U . . . U frjc(qe) and v £ Vars(qf r ). Let i 
be such that (g/ r ,0) € fr/cfe)- Now C x / implies, by Lemma 12, that X \= qf r and, by 
Lemma 18, X \= qi and, hence, X \= q, contradicting our assumption. Thus X \= T C ->C 
and, thus, I \= T q . 

Next, we define an extended ABox A q such that, for each q' G G, 

• if C(a) G q' and a x € ^C 1 , then ->C(a) € 

• if r(a, 6) 6 g' and (a x , b 1 ) £ r x , then -r(a, 6) € ^4 g . 

Now assume that we can have a query q' = ground(gj r , R, r) G ground^(gi)U. . .Uground^^) 
such that there is no atom at G </ with —>at G ^4 g . Then trivially X \= q'. Let i be such that 
(qf r ,R) G frA;(<Zi)- By Theorem 19, X \= qi and thus X \= q, which is a contradiction. Hence 
K, q is an extended knowledge base and X \= K, q as required. 

For the "if" -direction, we assume that IC \= q, but the algorithm answers "/C does not 
entail (/". Hence there is an extended knowledge base fC q = (T U T q , 1Z, A U ^4 9 ) that is 
consistent, i.e., there is a model X such that X \= IC q . Since K, q is an extension of /C, 
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I (= /C. Moreover, we have that X \= T q and hence, for each d G A x , d G -iC 1 for each 
C(v) G treesx:(gi) U . . . U trees/e(%). By Lemma 12, we then have that 1 \^ q' for each 
G treesx;((/i) U . . . U trees)c{qe) and, by Lemma 18, 1 ty= qi for each i with 1 < i < I. 
By definition of extended knowledge bases, A q contains an assertion -iai for at least one 
atom at in each query q' = ground (q f r , R,t) from ground^gi) U . . . U ground^(^). Hence 
1 ^= q' for each q' G ground^ (qi) U . . . U ground^(^). Then, by Theorem 19, 1 ty= q, which 
contradicts our assumption. □ 

Lemma (25). Let 1Z be a role hierarchy, and r±,...,r n roles. For every interpretation I 
such that 1\=TZ, it holds that (t(n n . . . n r n , TZ)) X = (r 1 n . . . n r n ) x . 

Proof of Lemma 25. The proof is a straightforward extension of Lemma 6.19 by Tobies 
(2001). By definition, f (n n . . . n r n ,ft) =f (r x ,ft) U ...n t(»n,ft) and, by defini- 
tion of the semantics of role conjunctions, we have that (t( r i , ft) n . . . n f(r n , 1Z) ) X = 
t(n,K) I n...nt(r„,7J) 1 . If st± n r, then {s 1 \ r FL n s'} C {s' \ s FL n s'} and hence 
t(s,TZ) T C t(^,^-) Z - If Z |= ft, then r 1 C s z for every s with r S^s. Hence, ^(r,K) Z = r x 

and (t(n n...nr n ,ft)) x = (Kn, ft) n . . . n t(r„, ft)) 2 = t(n,ft) z n . . . n t(rn,ftf = 

ri 1 n . . . n r n x = (ri l~l . . . I~l r n ) X as required. □ 

Lemma (28). Given a SHIQ n knowledge base /C = (T,TZ,A) where m := |/C| and the size 
of the longest role conjunction is n, we can decide consistency of /C in deterministic time 
2P("i)2 p ( n > with p a polynomial. 

Proof of Lemma 28. We first translate fC into an ACCQZb knowledge base tr(/C, ft) = 
(tr(T, ft), tr(^t, ft)). Since the longest role conjunction is of size n, the cardinality of each 
set tc(R, ft) for a role conjunction R is bounded by m n . Hence, the TBox tr(T, ft) can 
contain exponentially many axioms in n. It is not hard to check that the size of each axiom 
is polynomial in m. Since deciding whether an ACCQZb KB is consistent is an ExpTime- 
complete problem (even with binary coding of numbers) (Tobies, 2001, Theorem 4.42), the 
consistency of tr(/C, ft) can be checked in time 2P^ 2pM . □ 

Lemma (29). Let KL = (T,1Z,A) be a SHXQ knowledge base with m := |/C| and q a 
union of connected Boolean conjunctive queries with n := \q\. The algorithm given in 
Definition 22 decides whether )C \= q under the unique name assumption in deterministic 
time in 2P( m ) 2P( " > . 

Proof of Lemma 29. We first show that there is some polynomial p such that we have 
to check at most 2 p ( m ) 2P<n) extended knowledge bases for consistency and then that each 
consistency check can be done in time 2 p ( m ) 2P(n) , which gives an upper bound of 2P^ 2P{n) 
on the time needed for deciding whether /C |= q. 

Let q := q± V . . . V qg. Clearly, we can use n as a bound for £, i.e., £ < n. Moreover, the 
size of each query q^ with 1 < i < £ is bounded by n. Together with Lemma 20, we get that 
tt(T) and (t(G) are bounded by 2 p( - n > lo ^ m ^ for some polynomial p and it is clear that the 
sets can be computed in this time bound as well. The size of each query q' G G w.r.t. an 
ABox A is polynomial in n and, when constructing A q , we can add a subset of (negated) 
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atoms from each q' € G to A q . Hence, there are at most 

2P (m)2^) extended ABoxes An 

and, therefore, 2 p ( m ) 2P<n) extended knowledge bases that have to be tested for consistency. 

Due to Lemma 20 (5), the size of each query q' £ T is polynomial in n. Computing a 
query concept C q > of q' w.r.t. some variable v € Vars(g') can be done in time polynomial 
in n. Thus the TBox T q can be computed in time 2 p ^ Ao ^ m \ The size of an extended 
ABox A q is maximal if we add, for each of the 2 p( ~ n > logp( ~ m ^ ground queries in G, all atoms 
in their negated form. Since, by Lemma 20 (6), the size of these queries is polynomial in n, 
the size of each extended ABox A q is bounded by 2 p ( n * ) ' logp ( m ') and it is clear that we can 
compute an extended ABox in this time bound as well. Hence, the size of each extended 
KB K q = (T UT q ,1l,AuA q ) is bounded by 2 p ^ lo? > p{ - m \ Since role conjunctions occur only 
in Tg or A q , and the size of each concept in T q and A q is polynomial in n, the length of the 
longest role conjunction is also polynomial in n. 

When translating an extended knowledge base into an ACCQIb knowledge base, the 
number of axioms resulting from each concept C that occurs in T q or A q can be exponential 
in n. Thus, the size of each extended knowledge base is bounded by 2 p ( n ^' logp ( m \ 

Since deciding whether an ACCQZb knowledge base is consistent is an ExpTime- 
complete problem (even with binary coding of numbers) (Tobies, 2001, Theorem 4.42), 
it can be checked in time 2 p( - m ^ p(n) if K is consistent or not. 

Since we have to check at most 2 p ( m ) 2P<n) knowledge bases for consistency, and each 
check can be done in time 2 p ( m ' )2P " , we obtain the desired upper bound of 2 p ^ 2P(n) for 
deciding whether fC \= q. □ 

Lemma (31). Let fC = (T,TZ,A) be a STLXQ knowledge base and q a union of Boolean 
conjunctive queries. tC ty= q without making the unique name assumption iff there is an 
A-partition KT = (T, 1Z, A v ) and q v w.r.t. JC and q such that KT \£ q v under the unique 
name assumption. 



Proof of Lemma 31. For the "only if" -direction: Since JC *tf= q, there is a model X of JC 
such that X \/= q. Let /: lnds(^4) — > lnds(_4) be a total function such that, for each set of 
individual names {a±, . . . , a n } for which a\ = ai X for 1 < i < n, f{ai) = a\. Let A v and 
q v be obtained from A and q by replacing each individual name a in A and q with /(a). 
Clearly, KT = {T,1Z,A V ) and q v are an ^-partition w.r.t. K, and q. Let T v = (A x ,- xV ) 
be an interpretation that is obtained by restricting - x to individual names in lnds(*4^). It 
is easy to see that |= KT and that the unique name assumption holds in T? . We now 
show that T v \/= q^ . Assume, to the contrary of what is to be shown, that \= n q^ for 
some match tt'. We define a mapping ir: Terms(g) — > A x from ir' such ir(a) = 7r'(/(a)) for 
each individual name a € lnds(g) and ir(v) = tt'(v) for each variable v € Vars(q). It is easy 
to see that I \= T q, which is a contradiction. 

For the "if" -direction: Let l v = (A x ,- xV ) be such that l v \= JC V under UNA and 
\f= q v and let /: Inds(^l) — s> lnds(^4 7 ') be a total function such that f(a) is the individual 
that replaced a in A^ and q^ . Let X = (A x , x ) be an interpretation that extends X^ such 
that a x = f(a) X . We show that X \= fC and that X \/= q. It is clear that X \= T . Let 
C(a) be an assertion in A such that a was replaced with oT in A^ . Since X v |= C{a7 > ) 

and a x = f{af T = a vX G C xT ', X \= C{a). We can use a similar argument for (possibly 
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negated) role assertions. Let a ^ b be an assertion in A such that a was replaced with a 
and b with b v in A v , i.e., f{a) = a v and /(&) = b v . Since X v (= a p a z = /(af = 

a 7 ' 2 ' ^ ft^" 1 = f(b) x = b x and Z ^ a 7^ 6 as required. Therefore, we have that I \= K as 
required. 

Assume that I \= n q for a match it. Let tt^ : Terms(g^') — > A x be a mapping such that 
TT^iy) = 7t(t;) for v E Vars((/^) and -^(oF) = ir(a) for € Inds(q^) and some a such that 
a v = f(a). Let C(a p ) £ be such that C(a) € g and a was replaced with a 77 , i.e., /(a) = 

a 73 . By assumption, vr(a) € C 1 , but then vr(a) = a x = f(a) X = aP X = ■K P (a p ) € C 173 
and 2/^ ^ C{a p ). Similar arguments can be used to show entailment for role and equality 
atoms, which yields the desired contradiction. □ 

Theorem (35). Let /C = (T,1Z,A) be a SHXQ knowledge base with m := |/C| and q := 
gi V ... V ft a union of Boolean conjunctive queries with n := \q\. The algorithm given in 
Definition 34 decides in non- deterministic time p(m a ) whether K. Y= q for m a := \A\ and p 
a polynomial. 



Proof of Theorem 35. Clearly, the size of an ABox A v in an ^4-partition is bounded by m a . 
As established in Lemma 32, the maximal size of an extended ABox A^ is polynomial in 
m a . Hence, {A^ U A^\ < p(m a ) for some polynomial p. Due to Lemma 20 and since the 
size of q, T, and 1Z is fixed by assumption, the sets trees£-p(gj) and ground^ {q{) for each 
i such that 1 < i < I can be computed in time polynomial in m a . From Lemma 29, we 
know that the translation of an extended knowledge base into an ACCQZb knowledge base 
is polynomial in m a and a close inspection of the algorithm by Tobies (2001) for deciding 
consistency of an ACCQZb knowledge base shows that its runtime is also polynomial in 
m a . □ 
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